What is the recommended setup if you have a search head and saved searches that write data to a summary index?
I read, that search time configurations should reside on the search head.
So what I would like to do is configure the saved searches that write to a summary index on the search head and write summary data to summary indexes on the indexers where i search.
Is this possible/the right approach?
Thank you for helping me.
Chris
Totally doable. There are have been two situations recently where I have deployed Splunk in the exact configuration that you are wondering about. The key thing is to make your search head a forwarder---yeah.. that sounds kinda weird. When summary indexing jobs run on the search head, "stash" files are created. Stash files are the results of your summary indexing search, and hence are indexed--thats how summary indexes are created and populated. If your jobs run on a search head that is configured as a forwarder (in my case using AutoLB - switching between indexer(s) every 60 seconds), the summary index job results will be sprayed to your indexers.
There is a bit of an architectural debate on this implementation scheme. What if your summary index became unavailable? If your SI was on your search head and the search head went down, you'd have no summary indexes at all? If your SI was sprayed to N indexers and one indexer went down, you'd have "part" of your summary data. Is having no summary data better than having part of it? Something for you to consider. Personally I like spraying summary indexing results from my search head to my indexes, that way i get the benefit of all those indexer cores when i'm searching the summary indexes.
At Splunk's User Conference, Karandeep (Deep) Bains and I setup a "Solutions Lab" in Amazon EC2. Some Splunk Solutions were "large", and some were "small". Large's had four indexers, Small had two. Each had a search head that ran summary jobs and acted as the search UI for many of the Splunk apps we were demonstrating.
This image shows our architecture for each "size" of solution.
As Michael says, this is certainly a supported deployment.
There are three basic ways in which summary indexing can be set up in a distributed environment:
We don't usually recommend setting up independent summarization on the indexers (as described by zscgeek). This is for both management and performance reasons. From a management perspective, it's harder, but not impossible, to make sure that summaries built on a search head will work perfectly on the indexers. From a performance perspective, we give up some of the "compression" implied by summarization. In any of the above three approaches, for any given time period there will be exactly one summarization. If all the indexers summarize independently, then each time period will have n summaries, one for each indexer. This uses more space and requires computation to recombine.
I am setting up a saved, scheduled search on a dedicated search head that I want forwarded to indexers. I am using a custom summary index that exists on the indexers but not on the search head. What should I set action.summary_index._name
to?
Totally doable. There are have been two situations recently where I have deployed Splunk in the exact configuration that you are wondering about. The key thing is to make your search head a forwarder---yeah.. that sounds kinda weird. When summary indexing jobs run on the search head, "stash" files are created. Stash files are the results of your summary indexing search, and hence are indexed--thats how summary indexes are created and populated. If your jobs run on a search head that is configured as a forwarder (in my case using AutoLB - switching between indexer(s) every 60 seconds), the summary index job results will be sprayed to your indexers.
There is a bit of an architectural debate on this implementation scheme. What if your summary index became unavailable? If your SI was on your search head and the search head went down, you'd have no summary indexes at all? If your SI was sprayed to N indexers and one indexer went down, you'd have "part" of your summary data. Is having no summary data better than having part of it? Something for you to consider. Personally I like spraying summary indexing results from my search head to my indexes, that way i get the benefit of all those indexer cores when i'm searching the summary indexes.
At Splunk's User Conference, Karandeep (Deep) Bains and I setup a "Solutions Lab" in Amazon EC2. Some Splunk Solutions were "large", and some were "small". Large's had four indexers, Small had two. Each had a search head that ran summary jobs and acted as the search UI for many of the Splunk apps we were demonstrating.
This image shows our architecture for each "size" of solution.
Great answer; I'm curious how you implement this in practice. Do you have an inputs.conf
on your search head that you point to the stash files? If so, how do you prevent the search head from searching its own locally stored stash files?
Great answer!
Thanks, you put a lot of effort into that answer.
Note that you don't have to distribute your summary data back to the main indexers. You could run your summarizing jobs on a dedicated summarizer and have it forward the data over to a dedicated summary reporting cluster as well. It might make sense to store the summary data away from the raw index data if, e.g., almost all the end user reporting is off the summary and you can afford faster disks for that smaller amount of data, while the raw data is mostly used by jobs (including the summarizations themselves).