Archive

backfill_all.sh seems very slow

Path Finder

Trying out the new Web Intelligence app on a test box and, after doing a oneshot of some logs (indexer shows about 18 million events), I ran the backfill_all.sh script. Twenty-two hours later, it's still churning but very slowly.

Is this normal? I can't imagine what this would be like on a high-volume webserver - ours is merely middling.

Tags (1)
1 Solution

Motivator

What the backfill scripts do is run all the summary searches that are needed to populate the summary index over the configured search times for the time period specified.

So if you have a summary search that runs over a 5 minute time range, and you backfill one day's worth of summary data, the backfill script will run the search 24hours*60minutes/5minutes = 288 times. Usually it runs multiple searches concurrently in order to speed this process up a bit.

If you index one month worth of data and run the backfill script over this period, you will end up running 8,640 searches. Assuming 4 concurrent searches and a each set of 4 searches taking 15 seconds to complete, you are looking at approximately 9 hours to finish backfilling this summary data. Now if you have multiple summary searches and an even larger time frame, you can see how the backfill process can take quite a long amount of time, especially if your indexer isn't the fastest (which is usually the case on test boxes).

So yes, this is normal.

View solution in original post

Motivator

What the backfill scripts do is run all the summary searches that are needed to populate the summary index over the configured search times for the time period specified.

So if you have a summary search that runs over a 5 minute time range, and you backfill one day's worth of summary data, the backfill script will run the search 24hours*60minutes/5minutes = 288 times. Usually it runs multiple searches concurrently in order to speed this process up a bit.

If you index one month worth of data and run the backfill script over this period, you will end up running 8,640 searches. Assuming 4 concurrent searches and a each set of 4 searches taking 15 seconds to complete, you are looking at approximately 9 hours to finish backfilling this summary data. Now if you have multiple summary searches and an even larger time frame, you can see how the backfill process can take quite a long amount of time, especially if your indexer isn't the fastest (which is usually the case on test boxes).

So yes, this is normal.

View solution in original post

SplunkTrust
SplunkTrust

For what it's worth, I've found that if you know what you're doing with stats, you can manually run the backfill for each savedsearch in a single pass by binning by time and then doing '| stats by _time'. Although this is obviously a lot more error prone, it is for whatever reason immensely faster than letting the script run searches for all the time periods individually.

0 Karma