Knowledge Management

Summary index a query that runs weekely and gathers last months data.

joydeep741
Path Finder

Query

index=dotcom source=system exception earliest = -30d latest=now |
stats earliest(_time) as FirstOccurence by class exception |
fieldformat FirstOccurence = strftime(FirstOccurence,"%d/%m/%y %H:%M:%S") |
eval SevenDaysBack = relative_time(now(), "-7d@d" ) |
fieldformat SevenDaysBack = strftime(SevenDaysBack ,"%d/%m/%y %H:%M:%S")|
Table FirstOccurence class exception| where FirstOccurence > SevenDaysBack

This query takes into account all the events in the last 30 days . It takes forever to run this query. Can some one guide me how should i use the summary index technique to fasten the query processing .

Tags (2)
0 Karma

cpetterborg
SplunkTrust
SplunkTrust

You should run a summary search at least once a day to make much difference in the time.

You probably only need to do through the stats command in the summary search.

Save that search. Schedule it to run one a day, probably like at 2am. Set the search period to do a "yesterday" search so that you won't get any data overlap. Select the summary search checkbox. Add an additional value named something like "report" and give it a value that can help you distinguish which report. That is done near the bottom of the summary search section.

This should set up the search properly for you. When you run a search using that data, include the index in the search, like: index=summary

Also add the report name to look only for that data from the summary index, like: report=myreport

Then your search can use the later set of piped functions to do your calculations and come up with your desired results. You can specify the last 30 days and it will only have to look through a much smaller set of data, unless you have a large number of different class exception variables.

Of you need to backfill the summary index, you can. That is another discussion. If you have additional questions where I have been unclear, please comment.

0 Karma

joydeep741
Path Finder

Sincere thanks for explaining in details and its really helpful.

I am trying to understand how the summary indexing works.
So is it like, every time the search runs (say at 2am daily), some data is added to the summary index. and after some days the entire index is ready. And once its ready i can do further calculations and operations on it.

Can you brief me like how it works ? And how would i know when the index is ready ?

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Basically, when you run a search to create summary data you are creating a bunch of statistics from the data, like a count, an average, a last, a first, a min or a max, or any number of "summaries" of the events that you are summarizing. The details are not in the summary. So when you search the summary index, you are looking at a bunch of statistics for the data set (over time). The summary is just a bunch of statistics in the summary index, like the count of the events per host over the time period, for example. This data can then be search, summed, etc. by host over a set of days. I hope this is simple enough. If not, let me know.

You add a field like the "report" name to separate the data that you are putting into the summary index. Otherwise you could be mixing the data up with other data from the summary index.

You don't have to wait for the data to fill up a month worth of summaries to get a 30-day report. You can backfill the data, once for each day going back into the past. You do that from a command line on a search head. Use the following documentation to see how you can fill in gaps (back fill) in the index:

http://docs.splunk.com/Documentation/Splunk/6.2.2/Knowledge/Managesummaryindexgapsandoverlaps

I'm going to have to continue in the next comment....

0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Don't be worried about not getting it right the first time, you can re-summarize the data. The summary indexing isn't counted against your license. You can just change the "report" name, and re-run the backfill script to get the data right. I would suggest you not do too much data at once so that you don't put too much "bad" data in the summary index, because summary data never expires. You will also want to look at the data going into the past for where you might have to end or start (you have to specify the time period for the backfill and sometimes you don't get the end data right because you don't want to have the data duplicated for a day, so err on the conservative side, and then add an additional day at the end or the beginning. You'll get the hang of it. As it runs you can see when it is completed and what the progress is at it spits out information to the command line. When it is done, it will be finished on the command line.

Please ask more questions if you need to.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...