Splunk Search

List files in search then table the duration between beginning and end of processing

Explorer

I'm trying to limit the scope of a search to a 24 hour window and list all of the files analyzed within that period but then display a table of how long it took to finish processing each file.

For example, I have:

2015-07-31 15:56:13.7599|DEBUG|FileAnalyzer|Finished file.zip
2015-07-31 15:56:13.0099|DEBUG|FileAnalyzer|Starting file.zip

I can get something as simple as:

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | stats count by inputfile

But that's a simple count. I'm looking for a table that will resemble something like:

INPUTFILE | DURATION
file.zip | 3.067

I've tried a combination of map, transaction, and table with and without subsearches without being successful. Any help would be most appreciated.

Tags (3)
0 Karma
1 Solution

Revered Legend

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

View solution in original post

Revered Legend

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

View solution in original post

Explorer

I appreciate your suggestion and it looks like that is working exactly as I had expected. Thank you!

0 Karma

Esteemed Legend

Assuming that a file can be processed more than once, then this should work:

  | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | streamstats last(EndTime) AS EndTime by inputfile | where Stage="Starting" | eval Duration = EndTime - _time| table inputfile Duration
0 Karma

Explorer

Streamstats still has the same issue but also introduced multiple instances of the input file name without a duration.

0 Karma

Esteemed Legend

Run this search for "last 24 hours":

 | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | stats earliest(StartTime) AS StartTime latest(EndTime) AS EndTime by inputfile | eval Duration = EndTime - StartTime | table inputfile Duration

Explorer

That looks to be a brilliant solution! However, the main search brings up 14 results of starting/finished log entry pairs which would be 7 files but I'm getting 6 results with one of them >1,000 seconds (for duration). The one file that is >1,000 seconds shows up twice in the search results but both instances are ❤️ seconds. I think what's happening is that the start time if the first instance is being matched with the end time of the last instance; would I be able to call upon your perspicacity for thoughts on this?

0 Karma

Esteemed Legend

My solution presumes that you are timestamping your events correctly and that each event's _time value is the same as the first PSV value in each row. Is this the case? It also presumes that each file is only processed once and I think that is where the breakdown is. Is it possible for a file to be processed more than once?

0 Karma

Explorer

The timestamps should be handled correctly but I'm afraid I don't understand the second part of that sentence. Under ideal circumstances, the assumption that a file would be used only once would be accurate but during testing it is possible that a file may be processed more than once.

0 Karma