Solved: question on summary indexing

jiaqya · ‎12-02-2017

couple of questions i have:

1st question:
i have a large amount of data which i run summary index everyday and collects 24 hour data.( -2d@d and -1d@d)
As of now since it does not have much data it performs well.

im concerned when the data size would grow eventually it may create problems with performance.
so my question is, can i do a weekly summary index run on the daily summary index that is already running.
will this help with improving performance when data grows..

2nd question :
Also, how do i go about backfilling older data into the new summary index.
since the current summary index is daily, if i run the python script to back fill, will it add old data in daily format or all previous data at once ?

thanks in advance for your clarifications..

woodcock · ‎12-02-2017

A1: Yes, it is very common to do multiple rollups/layers of SI: raw->daily + daily->monthly is very common.
A2: The backfill script checks each time period and if there are ANY events in that period for this search, it will skip that period. So it is safe to run multiple times even with overlapping dates. As far as a different rollup, that will be a different search so they will not step on eachother. Really, though, they should be different SIs, too.

View solution in original post

ddrillic · ‎12-04-2017

I think that running it more frequently and avoiding the cluster latency issues, by distancing your self a bit time frame wise, is a good way to do it.

About the latency - How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

woodcock · ‎12-02-2017

A1: Yes, it is very common to do multiple rollups/layers of SI: raw->daily + daily->monthly is very common.
A2: The backfill script checks each time period and if there are ANY events in that period for this search, it will skip that period. So it is safe to run multiple times even with overlapping dates. As far as a different rollup, that will be a different search so they will not step on eachother. Really, though, they should be different SIs, too.

jiaqya · ‎12-03-2017

Gregg, Great, for point 2 , i got it all working , thanks..

for point 1 : just one clarification : if i created the first SI just by raw data which runs every day, then when i run the weekly or monthly search on the SI , should i specify the date range on the search to collect or use the timechart span on the search.

ie , in the query should i collect by timechart ie span=7d for weekly run? or just collect what ever is there in the SI on weekly schedule.

or

specify the earliest and latest on the search to be -7d@d -1d@d to collect all of it weekly..

which is the right option to go for...

woodcock · ‎12-04-2017

For hourly this doesn't matter but for "daily" on up, just be aware that the TZ for the user that runs the populating search will introduce a bias for that particular TZ. So if you are a multi-site (multi-TZ) operation, some people in other TZs may be confused because their raw searches for "yesterday" or "last week" will not match yours because a TZ-bias has been baked into the summary.

And yes, obviously, for each rollup populating search layer, you need to wait for the populating searches for the lower layer to finish and leave a little buffer on the end (e.g. wait an hour after the last hourly populating search for the day starts before running your daily populating search.

jiaqya · ‎02-23-2018

i did some work on the summary indexing and understand it much better now and how to use it. i have been successful in using it in many of my reports with large data and happy with results.. summary indexing rocks 🙂

question on summary indexing

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Data Drivers: How We're Streaming Real-Time F1 Telemetry Directly into Splunk ...

Data Management Digest – July 2026

Announcing Modern Navigation: A New Era of Splunk User Experience

Join the Conversation