I'm trying to limit the scope of a search to a 24 hour window and list all of the files analyzed within that period but then display a table of how long it took to finish processing each file.
For example, I have:
2015-07-31 15:56:13.7599|DEBUG|FileAnalyzer|Finished file.zip
2015-07-31 15:56:13.0099|DEBUG|FileAnalyzer|Starting file.zip
I can get something as simple as:
host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | stats count by inputfile
But that's a simple count. I'm looking for a table that will resemble something like:
INPUTFILE | DURATION
file.zip | 3.067
I've tried a combination of map, transaction, and table with and without subsearches without being successful. Any help would be most appreciated.
How about this
host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration
How about this
host=server earliest=7/31/2015:00:00:00 latest=7/31/2015:23:59:59 ("Starting" OR "Finished") | rex "(?<inputfile>\w*.zip)" | transaction maxevents=2 inputfile | table inputfile duration
I appreciate your suggestion and it looks like that is working exactly as I had expected. Thank you!
Assuming that a file can be processed more than once, then this should work:
| rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | streamstats last(EndTime) AS EndTime by inputfile | where Stage="Starting" | eval Duration = EndTime - _time| table inputfile Duration
Streamstats still has the same issue but also introduced multiple instances of the input file name without a duration.
Run this search for "last 24 hours":
| rex "\|(?<Stage>\w+)\s+(?<inputfile>\w*.zip)" | eval StartTime=if(Stage="Starting", _time, null()) | eval EndTime=if(Stage="Finished", _time, null()) | stats earliest(StartTime) AS StartTime latest(EndTime) AS EndTime by inputfile | eval Duration = EndTime - StartTime | table inputfile Duration
That looks to be a brilliant solution! However, the main search brings up 14 results of starting/finished log entry pairs which would be 7 files but I'm getting 6 results with one of them >1,000 seconds (for duration). The one file that is >1,000 seconds shows up twice in the search results but both instances are ❤️ seconds. I think what's happening is that the start time if the first instance is being matched with the end time of the last instance; would I be able to call upon your perspicacity for thoughts on this?
My solution presumes that you are timestamping
your events correctly and that each event's _time
value is the same as the first PSV value in each row. Is this the case? It also presumes that each file is only processed once and I think that is where the breakdown is. Is it possible for a file to be processed more than once?
The timestamps should be handled correctly but I'm afraid I don't understand the second part of that sentence. Under ideal circumstances, the assumption that a file would be used only once would be accurate but during testing it is possible that a file may be processed more than once.