Splunk Search

List files in search then table the duration between beginning and end of processing

jamieivanov
Explorer

I'm trying to limit the scope of a search to a 24 hour window and list all of the files analyzed within that period but then display a table of how long it took to finish processing each file.

For example, I have:

2015-07-31 15:56:13.7599|DEBUG|FileAnalyzer|Finished file.zip
2015-07-31 15:56:13.0099|DEBUG|FileAnalyzer|Starting file.zip

I can get something as simple as:

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | stats count by inputfile

But that's a simple count. I'm looking for a table that will resemble something like:

INPUTFILE | DURATION
file.zip | 3.067

I've tried a combination of map, transaction, and table with and without subsearches without being successful. Any help would be most appreciated.

Tags (3)
0 Karma
1 Solution

somesoni2
Revered Legend

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

View solution in original post

somesoni2
Revered Legend

How about this

host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration

jamieivanov
Explorer

I appreciate your suggestion and it looks like that is working exactly as I had expected. Thank you!

0 Karma

woodcock
Esteemed Legend

Assuming that a file can be processed more than once, then this should work:

  | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | streamstats last(EndTime) AS EndTime by inputfile | where Stage="Starting" | eval Duration = EndTime - _time| table inputfile Duration
0 Karma

jamieivanov
Explorer

Streamstats still has the same issue but also introduced multiple instances of the input file name without a duration.

0 Karma

woodcock
Esteemed Legend

Run this search for "last 24 hours":

 | rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | stats earliest(StartTime) AS StartTime latest(EndTime) AS EndTime by inputfile | eval Duration = EndTime - StartTime | table inputfile Duration

jamieivanov
Explorer

That looks to be a brilliant solution! However, the main search brings up 14 results of starting/finished log entry pairs which would be 7 files but I'm getting 6 results with one of them >1,000 seconds (for duration). The one file that is >1,000 seconds shows up twice in the search results but both instances are ❤️ seconds. I think what's happening is that the start time if the first instance is being matched with the end time of the last instance; would I be able to call upon your perspicacity for thoughts on this?

0 Karma

woodcock
Esteemed Legend

My solution presumes that you are timestamping your events correctly and that each event's _time value is the same as the first PSV value in each row. Is this the case? It also presumes that each file is only processed once and I think that is where the breakdown is. Is it possible for a file to be processed more than once?

0 Karma

jamieivanov
Explorer

The timestamps should be handled correctly but I'm afraid I don't understand the second part of that sentence. Under ideal circumstances, the assumption that a file would be used only once would be accurate but during testing it is possible that a file may be processed more than once.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...

Network to App: Observability Unlocked [May & June Series]

In today’s digital landscape, your environment is no longer confined to the data center. It spans complex ...