Getting Data In

question on summary indexing

Builder

couple of questions i have:

1st question:
i have a large amount of data which i run summary index everyday and collects 24 hour data.( -2d@d and -1d@d)
As of now since it does not have much data it performs well.

im concerned when the data size would grow eventually it may create problems with performance.
so my question is, can i do a weekly summary index run on the daily summary index that is already running.
will this help with improving performance when data grows..

2nd question :
Also, how do i go about backfilling older data into the new summary index.
since the current summary index is daily, if i run the python script to back fill, will it add old data in daily format or all previous data at once ?

thanks in advance for your clarifications..

0 Karma
1 Solution

Esteemed Legend

A1: Yes, it is very common to do multiple rollups/layers of SI: raw->daily + daily->monthly is very common.
A2: The backfill script checks each time period and if there are ANY events in that period for this search, it will skip that period. So it is safe to run multiple times even with overlapping dates. As far as a different rollup, that will be a different search so they will not step on eachother. Really, though, they should be different SIs, too.

View solution in original post

0 Karma

Ultra Champion

I think that running it more frequently and avoiding the cluster latency issues, by distancing your self a bit time frame wise, is a good way to do it.

About the latency - How can we avoid data loss in the summary indexes when there is an indexing latency in the cluster?

0 Karma

Esteemed Legend

A1: Yes, it is very common to do multiple rollups/layers of SI: raw->daily + daily->monthly is very common.
A2: The backfill script checks each time period and if there are ANY events in that period for this search, it will skip that period. So it is safe to run multiple times even with overlapping dates. As far as a different rollup, that will be a different search so they will not step on eachother. Really, though, they should be different SIs, too.

View solution in original post

0 Karma

Builder

Gregg, Great, for point 2 , i got it all working , thanks..

for point 1 : just one clarification : if i created the first SI just by raw data which runs every day, then when i run the weekly or monthly search on the SI , should i specify the date range on the search to collect or use the timechart span on the search.

ie , in the query should i collect by timechart ie span=7d for weekly run? or just collect what ever is there in the SI on weekly schedule.

or

specify the earliest and latest on the search to be -7d@d -1d@d to collect all of it weekly..

which is the right option to go for...

0 Karma

Esteemed Legend

For hourly this doesn't matter but for "daily" on up, just be aware that the TZ for the user that runs the populating search will introduce a bias for that particular TZ. So if you are a multi-site (multi-TZ) operation, some people in other TZs may be confused because their raw searches for "yesterday" or "last week" will not match yours because a TZ-bias has been baked into the summary.

And yes, obviously, for each rollup populating search layer, you need to wait for the populating searches for the lower layer to finish and leave a little buffer on the end (e.g. wait an hour after the last hourly populating search for the day starts before running your daily populating search.

0 Karma

Builder

i did some work on the summary indexing and understand it much better now and how to use it. i have been successful in using it in many of my reports with large data and happy with results.. summary indexing rocks 🙂

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!