Getting Data In

What is the cleanest way to store when a search was last run?

Explorer

I have a search which is run to generate data and output the CSV to be processed later by another program. Due to the nature of the other program, I need to ensure that I never output the same data twice in my CSV, or it will count the results twice and give inaccurate scores. So I want my saved search, which runs on some automated time frame, to be written to only return results that were added since the last time the search was run.

What is the cleanest manner to do this? So far I only know of two options, but I don't really like either. The first is to hard code a knowledge of the interval between search runs into the search itself so if my search runs every 8 hours, then add into the search a criteria that looks for _time > current-8 hours. Of course, if someone changes the interval this search runs or runs the search manually, this would screw up.

The second is to save a text file with a "last run" date that is loaded, and make the search look for _time > last_run. However, I don't know how to do this entirely from Splunk. I only know how to do this if I use a separate Python script with Splunk SDK which I can do, but would prefer to not need to.

Is there a cleaner way to maintain an awareness of the last time a search was run so that I only look at any data that was added to Splunk since that time?

0 Karma

Splunk Employee
Splunk Employee

Hi,

searching for searches itself the index=_audit would be a better option because you get detailed information about the type of search (scheduled, accelerated, adhoc, etc.).

Your second option could be done using "outputcsv" or other lookup techniques.

HTH,

Holger

0 Karma

SplunkTrust
SplunkTrust

Firstly, the hard-coded way isn't that bad to do in Splunk - if you have a search running on 0 */8 * * * you just set your time range to be eight hours long and you're pretty much there.

For a different approach, you can query Splunk's index=_internal sourcetype=scheduler savedsearch_name="yoursearch" for the last time it ran, over what time range it was run, and so on. Use that in the next search to calculate the time range this run needs to go through.