Solved: Summary indexing on a search head

chris · ‎08-20-2010

What is the recommended setup if you have a search head and saved searches that write data to a summary index?

I read, that search time configurations should reside on the search head.

So what I would like to do is configure the saved searches that write to a summary index on the search head and write summary data to summary indexes on the indexers where i search.

Is this possible/the right approach?

Thank you for helping me.

Chris

Michael_Wilde · ‎08-20-2010

Totally doable. There are have been two situations recently where I have deployed Splunk in the exact configuration that you are wondering about. The key thing is to make your search head a forwarder---yeah.. that sounds kinda weird. When summary indexing jobs run on the search head, "stash" files are created. Stash files are the results of your summary indexing search, and hence are indexed--thats how summary indexes are created and populated. If your jobs run on a search head that is configured as a forwarder (in my case using AutoLB - switching between indexer(s) every 60 seconds), the summary index job results will be sprayed to your indexers.

There is a bit of an architectural debate on this implementation scheme. What if your summary index became unavailable? If your SI was on your search head and the search head went down, you'd have no summary indexes at all? If your SI was sprayed to N indexers and one indexer went down, you'd have "part" of your summary data. Is having no summary data better than having part of it? Something for you to consider. Personally I like spraying summary indexing results from my search head to my indexes, that way i get the benefit of all those indexer cores when i'm searching the summary indexes.

At Splunk's User Conference, Karandeep (Deep) Bains and I setup a "Solutions Lab" in Amazon EC2. Some Splunk Solutions were "large", and some were "small". Large's had four indexers, Small had two. Each had a search head that ran summary jobs and acted as the search UI for many of the Splunk apps we were demonstrating.

This image shows our architecture for each "size" of solution.

View solution in original post

Stephen_Sorkin · ‎08-20-2010

As Michael says, this is certainly a supported deployment.

There are three basic ways in which summary indexing can be set up in a distributed environment:

Perform summary indexing on the search head. This is the simplest deployment and usually works very well. It has the advantage of making use of the disk resources on the search head, which are typically underused.
Perform summary indexing on a separate search head, and distribute search from the primary search head to the summarizing search head. This reduces compute burden on the primary search head. This is almost always the best use of a second search head.
Perform summary indexing on either a primary or secondary search head and forward the data to the indexers. This is typically the highest performance deployment. Any summary search will be able to retrieve the summary records in parallel from all the indexers.

We don't usually recommend setting up independent summarization on the indexers (as described by zscgeek). This is for both management and performance reasons. From a management perspective, it's harder, but not impossible, to make sure that summaries built on a search head will work perfectly on the indexers. From a performance perspective, we give up some of the "compression" implied by summarization. In any of the above three approaches, for any given time period there will be exactly one summarization. If all the indexers summarize independently, then each time period will have n summaries, one for each indexer. This uses more space and requires computation to recombine.

platform_pie · ‎08-31-2011

I am setting up a saved, scheduled search on a dedicated search head that I want forwarded to indexers. I am using a custom summary index that exists on the indexers but not on the search head. What should I set action.summary_index._name to?

Michael_Wilde · ‎08-20-2010

Totally doable. There are have been two situations recently where I have deployed Splunk in the exact configuration that you are wondering about. The key thing is to make your search head a forwarder---yeah.. that sounds kinda weird. When summary indexing jobs run on the search head, "stash" files are created. Stash files are the results of your summary indexing search, and hence are indexed--thats how summary indexes are created and populated. If your jobs run on a search head that is configured as a forwarder (in my case using AutoLB - switching between indexer(s) every 60 seconds), the summary index job results will be sprayed to your indexers.

There is a bit of an architectural debate on this implementation scheme. What if your summary index became unavailable? If your SI was on your search head and the search head went down, you'd have no summary indexes at all? If your SI was sprayed to N indexers and one indexer went down, you'd have "part" of your summary data. Is having no summary data better than having part of it? Something for you to consider. Personally I like spraying summary indexing results from my search head to my indexes, that way i get the benefit of all those indexer cores when i'm searching the summary indexes.

At Splunk's User Conference, Karandeep (Deep) Bains and I setup a "Solutions Lab" in Amazon EC2. Some Splunk Solutions were "large", and some were "small". Large's had four indexers, Small had two. Each had a search head that ran summary jobs and acted as the search UI for many of the Splunk apps we were demonstrating.

This image shows our architecture for each "size" of solution.

dbryan · ‎08-17-2012

Great answer; I'm curious how you implement this in practice. Do you have an inputs.conf on your search head that you point to the stash files? If so, how do you prevent the search head from searching its own locally stored stash files?

rroberts · ‎08-11-2011

Great answer!

chris · ‎08-23-2010

Thanks, you put a lot of effort into that answer.

gkanapathy · ‎08-20-2010

Note that you don't have to distribute your summary data back to the main indexers. You could run your summarizing jobs on a dedicated summarizer and have it forward the data over to a dedicated summary reporting cluster as well. It might make sense to store the summary data away from the raw index data if, e.g., almost all the end user reporting is off the summary and you can afford faster disks for that smaller amount of data, while the raw data is mostly used by jobs (including the summarizations themselves).

Summary indexing on a search head

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Introducing ITSI 5.0: Unified Visibility and Actionable Insights

Inside Splunk Agent Observability: Understanding Agent Behavior, Tokens & Costs

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Join the Conversation