Getting Data In

Summary indexing on a search head

Motivator

What is the recommended setup if you have a search head and saved searches that write data to a summary index?

I read, that search time configurations should reside on the search head.

So what I would like to do is configure the saved searches that write to a summary index on the search head and write summary data to summary indexes on the indexers where i search.

Is this possible/the right approach?

Thank you for helping me.

Chris

1 Solution

Splunk Employee
Splunk Employee

Totally doable. There are have been two situations recently where I have deployed Splunk in the exact configuration that you are wondering about. The key thing is to make your search head a forwarder---yeah.. that sounds kinda weird. When summary indexing jobs run on the search head, "stash" files are created. Stash files are the results of your summary indexing search, and hence are indexed--thats how summary indexes are created and populated. If your jobs run on a search head that is configured as a forwarder (in my case using AutoLB - switching between indexer(s) every 60 seconds), the summary index job results will be sprayed to your indexers.

There is a bit of an architectural debate on this implementation scheme. What if your summary index became unavailable? If your SI was on your search head and the search head went down, you'd have no summary indexes at all? If your SI was sprayed to N indexers and one indexer went down, you'd have "part" of your summary data. Is having no summary data better than having part of it? Something for you to consider. Personally I like spraying summary indexing results from my search head to my indexes, that way i get the benefit of all those indexer cores when i'm searching the summary indexes.

At Splunk's User Conference, Karandeep (Deep) Bains and I setup a "Solutions Lab" in Amazon EC2. Some Splunk Solutions were "large", and some were "small". Large's had four indexers, Small had two. Each had a search head that ran summary jobs and acted as the search UI for many of the Splunk apps we were demonstrating.

This image shows our architecture for each "size" of solution.

This image shows our architecture for each "size" of solution.

View solution in original post

Splunk Employee
Splunk Employee

As Michael says, this is certainly a supported deployment.

There are three basic ways in which summary indexing can be set up in a distributed environment:

  1. Perform summary indexing on the search head. This is the simplest deployment and usually works very well. It has the advantage of making use of the disk resources on the search head, which are typically underused.
  2. Perform summary indexing on a separate search head, and distribute search from the primary search head to the summarizing search head. This reduces compute burden on the primary search head. This is almost always the best use of a second search head.
  3. Perform summary indexing on either a primary or secondary search head and forward the data to the indexers. This is typically the highest performance deployment. Any summary search will be able to retrieve the summary records in parallel from all the indexers.

We don't usually recommend setting up independent summarization on the indexers (as described by zscgeek). This is for both management and performance reasons. From a management perspective, it's harder, but not impossible, to make sure that summaries built on a search head will work perfectly on the indexers. From a performance perspective, we give up some of the "compression" implied by summarization. In any of the above three approaches, for any given time period there will be exactly one summarization. If all the indexers summarize independently, then each time period will have n summaries, one for each indexer. This uses more space and requires computation to recombine.

New Member

I am setting up a saved, scheduled search on a dedicated search head that I want forwarded to indexers. I am using a custom summary index that exists on the indexers but not on the search head. What should I set action.summary_index._name to?

0 Karma

Splunk Employee
Splunk Employee

Totally doable. There are have been two situations recently where I have deployed Splunk in the exact configuration that you are wondering about. The key thing is to make your search head a forwarder---yeah.. that sounds kinda weird. When summary indexing jobs run on the search head, "stash" files are created. Stash files are the results of your summary indexing search, and hence are indexed--thats how summary indexes are created and populated. If your jobs run on a search head that is configured as a forwarder (in my case using AutoLB - switching between indexer(s) every 60 seconds), the summary index job results will be sprayed to your indexers.

There is a bit of an architectural debate on this implementation scheme. What if your summary index became unavailable? If your SI was on your search head and the search head went down, you'd have no summary indexes at all? If your SI was sprayed to N indexers and one indexer went down, you'd have "part" of your summary data. Is having no summary data better than having part of it? Something for you to consider. Personally I like spraying summary indexing results from my search head to my indexes, that way i get the benefit of all those indexer cores when i'm searching the summary indexes.

At Splunk's User Conference, Karandeep (Deep) Bains and I setup a "Solutions Lab" in Amazon EC2. Some Splunk Solutions were "large", and some were "small". Large's had four indexers, Small had two. Each had a search head that ran summary jobs and acted as the search UI for many of the Splunk apps we were demonstrating.

This image shows our architecture for each "size" of solution.

This image shows our architecture for each "size" of solution.

View solution in original post

Path Finder

Great answer; I'm curious how you implement this in practice. Do you have an inputs.conf on your search head that you point to the stash files? If so, how do you prevent the search head from searching its own locally stored stash files?

Splunk Employee
Splunk Employee

Great answer!

0 Karma

Motivator

Thanks, you put a lot of effort into that answer.

0 Karma

Splunk Employee
Splunk Employee

Note that you don't have to distribute your summary data back to the main indexers. You could run your summarizing jobs on a dedicated summarizer and have it forward the data over to a dedicated summary reporting cluster as well. It might make sense to store the summary data away from the raw index data if, e.g., almost all the end user reporting is off the summary and you can afford faster disks for that smaller amount of data, while the raw data is mostly used by jobs (including the summarizations themselves).

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!