Deployment Architecture

Prevent Splunk from re-indexing data


Here is the scenario:

During any given day (usually @ midnite), I am 'auto' indexing 3-5 files into Splunk and then running a search query, then producing some sort of output/report using "outputcsv".

However, at any given point during the day, there will be new files being auto-indexed into Splunk. Right now, Splunk is grabbing the current files and including the events from the older files in the search results.

Is there a way to configure Splunk (or use an app) to generate N number of reports as each unique file gets indexed. In other words, if I index 5 XML files from Symantec, index them into Splunk, and create 5 separate .csv files which each contain unique information without duplicating the event information from a previously indexed file?

Tags (2)

Splunk Employee
Splunk Employee

This can be done using a scheduled search in conjunction with altering your search to use the index time rather than the indexed event time to create the CSV file you are looking.

So you will want to use something like...

sourcetype="Symantec" | eval myTime=now()-60 | where _indextime>myTime | outputcsv myFile.csv 

To break down the important pieces of this search, the | eval myTime=now()-60 portion uses eval to create a field which contains the epoch time stamp value of when the search is run and subtracts 60 seconds from that value. We then use the | where _indextime>myTime statement to get all the events from the last minute by comparing an events index time against the myTime value. If you run your scheduled search for every minute then the search will only contain events logged within the last minute. This should prevent most cases of duplication for the events that are put in to your CSV file.

Depending on how often you run the scheduled search, you will want to change the myTime value to correspond to the time frame of your scheduled search. In doing so, remember that all the values are in epoch time using seconds.

Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!