I have a search that will fetch about 5 GB of application logs. In order not to put load on the Splunk instance and slow search output, i am planning to use "Summary Indexing" using the new SI Commands. However, for using the SI commands, i have to use the scheduled report.
My question is if I am scheduling the report to run every day , why will the SI commands help improve Splunk's performance?
The documentation explains it very nicely at Use summary indexing for increased reporting efficiency
Use summary indexing to efficiently report on large volumes of data. With summary indexing, you set up a frequently-running search that extracts the precise information you want. Each time this search is run, its results are saved into a summary index that you designate. You can then run searches and reports on this significantly smaller (and thus seemingly "faster") summary index. And what's more, these reports will be statistically accurate because of the frequency of the index-populating search (for example, if you want to manually run searches that cover the past seven days, you might run them on a summary index that is updated on an hourly basis).
Summary indexing allows the cost of a computationally expensive report to be spread over time. In the example we've been discussing, the hourly search to populate the summary index with the previous hour's worth of data would take a fraction of a minute. Generating the complete report without the benefit of summary indexing would take approximately 168 (7 days * 24 hrs/day) times longer.
Perhaps an even more important advantage of summary indexing is its ability to amortize costs over different reports, as well as for the same report over a different but overlapping time range. The same summary data generated on a Tuesday can be used for a report of the previous 7 days done on the Wednesday, Thursday, or the following Monday. It could also be used for a monthly report that needed the average response size per day.
@keishamtcs I would recommend you go through Splunk .Conf 2017 session Searching FAST: How to Start Using tstats and Other Acceleration Techniques
Please check below simple and clear answers, why you need to go for summary index and how it works 🙂 ..
The concept of summary indexing, is that some data looses its validity over time at the granularity it was originally collected at, and at a certain point in time, you can summarize it and still have the answers you need. A great example of this, would be CPU metrics. You collect them every 30 seconds (maybe) throughout the day. That would equate to 2880 records per day for Splunk to retrieve. If you searched over 30 days, then it would retrieve 86400 records.
That level of granularity is probably only relevant for a single day. After that, a summarized (maybe min, max & avg) per hour are good enough. Using summary indexing, you can run a search once a day to retrieve all records (at 30 second intervals) and summarize them on a per hour basis, meaning when you search over the summary index, you only need to retrieve 24 records per day. It is this that makes Splunk more efficient and therefore performance improves.
You can play the same game over a longer time span also, where you create a per day statistic for longer term (months or years) trend analysis.
The key point here, is that you index all the data initially into your wineventlog index (as an example), then search that index hourly or daily and write a summary into the summary index. That will also allow you set a small retention period on the original index to minimize the disk usage.
Hope this helps.