Knowledge Management

How can I backfill summary index data from the web?

therealdpk
Path Finder

I would like to create a few summary indexes in order to run some searches more quickly -- starting with the search in http://splunk-base.splunk.com/answers/59927/improve-speed-of-append -- but I have not found any information on how to do this from the web. A lot of folks talk about using a command line script, but that's currently not an option (and not ideal, especially for re-summary-indexing in case of outages).

Tags (1)
0 Karma
1 Solution

Lucas_K
Motivator

AFAIK there is no way to do it from the web gui.

At best you could make your own web interface to call the "backfill" python script and pass some parameters to it.

Regardless it STILL needs to call fill_summary_index.py located in /opt/splunk/bin dir/.

Your only other alternative is to use the collect command. This just creates the summary index based off a normal search.

Craft you search so that all the fields are the same as what would appear in the summary index and put the "collect" command at the end and specify the summary index to save the stash results to. You may need to use "addinfo" also depending on your requirements of how the summary is used.

More details here : http://docs.splunk.com/Documentation/Splunk/5.0/SearchReference/collect

View solution in original post

Lucas_K
Motivator

AFAIK there is no way to do it from the web gui.

At best you could make your own web interface to call the "backfill" python script and pass some parameters to it.

Regardless it STILL needs to call fill_summary_index.py located in /opt/splunk/bin dir/.

Your only other alternative is to use the collect command. This just creates the summary index based off a normal search.

Craft you search so that all the fields are the same as what would appear in the summary index and put the "collect" command at the end and specify the summary index to save the stash results to. You may need to use "addinfo" also depending on your requirements of how the summary is used.

More details here : http://docs.splunk.com/Documentation/Splunk/5.0/SearchReference/collect

Lucas_K
Motivator

Which means that if you havn't created a summary index it won't appear in the drop down!

0 Karma

Lucas_K
Motivator

A summary index is just the same as a normal index the difference being that you will only save summary data to it. Thus it needs to be created by you via the normal index creation methods.

In general I think its best practice to not save your results into your original data index so create a new summary index thats easily identifiable. What I use is summary_'my_original_index_name'.

I'm confused when you said "havn't created any summary indexes" then say "index appears in the drop-down". The drop down isn't the source of the events but the save destination of the results.

0 Karma

therealdpk
Path Finder

No, I haven't created any summary indexes yet. This'll be my first ever. No idea if the account has access to the index, but the index appears in the drop-down, so I assume so (does Splunk handle that right?). It's all enabled, FWIW.

0 Karma

Lucas_K
Motivator

I assume you've already created your summary indexes and that the account that runs the scheduled searches has access to this index?

A scheduled search won't save its results to a summary index "by default" you have to enable it.

There is an "enable" check box at the bottom aswell as a drop down to choose the summary index to save it to.

0 Karma

therealdpk
Path Finder

I've re-read it a few times, still not totally sure I'm getting it. If any Splunk staff are watching this I'd ask that they make a simple step 1, 2, 3 guide to accompany the document, heh.

I figure at this point I'll use "collect" to populate the summary index (initially) and then use a scheduled search to continue topping it off. Scheduled searches don't seem to create or populate summary indexes in my installation, so I'll have to work with the support staff directly to figure out why (no errors, query is run according to the inspector, results go nowhere.)

0 Karma

Lucas_K
Motivator

(not enough chars left to type this)

So what you do is run your first scheduled search via the web gui and use the collect command to pre-populate your summary index. Then just leave your scheduled search to continue this process on defined time intervals.

0 Karma

Lucas_K
Motivator

ok i think you may have missed the entire purpose of scheduled searches with summary indexing selected. It's works exactly as what your looking for in your last sentence.

ie. run a search every 5 mins that collects statistics within "some"* time frame. Save this to a summary index that I can refer to later.

This time frame can be any of splunks time specifiers ie. relative/absolute etc etc.

Have a read of this and see if it makes sense in your situation.

http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Usesummaryindexing

0 Karma

therealdpk
Path Finder

I don't necessarily want to run it repeatedly, but I do want the summary index to be kept up to date with no manual intervention. I figure I'd run something like "index=something sourcetype=something_else | bucket _time span=5m | stats count by _time" with relative times or something.

If collect can append data to the end of an existing summary index, I should be able to do what I need. I'd just need to figure out how to create an index from the web, but that's an unrelated question.

0 Karma

Lucas_K
Motivator

Your savedsearch results should go into a summary anyway so i'm not sure what your trying to achieve by running it repeatedly. Would a normal scheduled search get you your results?

If its ALWAYS the same search and same set of results (but perhaps slightly updated counts or something) then just create a scheduled saved search that populates a summary index. You will then just need to correctly dedup any searches that use this summary index. You will also need to window that search correctly and not just use all time otherwise it will get slower over time as the number of results increase.

0 Karma

therealdpk
Path Finder

Interesting -- do you know if I could schedule a collect command to run repeatedly, continually filling and topping off the summary index?

0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...