Hello,
I am looking for optimization advice for a use case in which I need to create new event data and then calculate time delta between two timestamps.
I have a set of tasks for which I need to calculate their lifetime, either from open to close, or from open to now() . Each task can be subject to multiple state changes: New, Waiting, In Progress, Complete, Closed. The objective is to calculate the time that a task is in each state (easy), and for those that are not closed I have to calculate the time up until now() that it has been in its last state (difficult). I would like some advice on how to manage the tasks that are not closed.
I will now describe what I have currently done using a simplified data structure. Firstly the data:
id,state,timestamp,prev_timestamp
id001,New,2018-11-01 14:32:50,
id001,In Progress,2018-11-01 16:54:43,2018-11-01 14:32:50
id001,Complete,2018-11-02 09:23:01,2018-11-01 16:54:43
id001,Closed,2018-11-02 15:41:53,2018-11-02 09:23:01
id002,New,2018-11-01 16:04:36,
id002,Waiting,2018-11-01 16:54:23,2018-11-01 16:04:36
id002,In Progress,2018-11-05 11:12:51,2018-11-01 16:54:23
id001 is the simple case because it is currently not open so I just calculate all the deltas, while id002 is the more difficult case since it is not closed. It's difficult because I first have to create a new event that represents the time between the last state change until now() , and it is here that I would like some advice. Saving the above data as sample.csv and loading it into Splunk, I currently use the following search to get what I'm after:
| inputlookup sample.csv
| streamstats count as running by id
| eventstats count as final by id
| eval last = if(running=final,1,0)
| eval now = strftime(now(),"%Y-%m-%d %H:%M:%S")
| table id state timestamp prev_timestamp running final last now
| eval timestamp = if(last=1 AND NOT state="Closed",mvappend(timestamp,mvzip(timestamp,now,",")),timestamp)
| mvexpand timestamp
| eval timestamp = split(timestamp,",")
| eval prev_timestamp = if(mvcount(timestamp)>1,mvindex(timestamp,0),prev_timestamp)
| eval timestamp = if(mvcount(timestamp)>1,mvindex(timestamp,1),timestamp)
| eval hours_in_state = round((strptime(timestamp,"%Y-%m-%d %H:%M:%S") - strptime(prev_timestamp,"%Y-%m-%d %H:%M:%S"))/3600,2)
To summarize what the search is doing:
Use streamstats and eventstats combination to identify the last event for each task (field called last )
For each task's last event, determine whether it is Closed
If a task's last event is not Closed (such as for id002), use mvzip and mvexpand to create a new event
Calculate time in hours
Step 3 is the most expensive operation because I have to concatenate two timestamps, create a multivalue, expand that multivalue to get the additional event, then split the timestamps again into their appropriate fields so that I can make the delta calculation. If you run it with two tasks it's fast, but I have more than 20000 tasks which sometimes causes the search to fail.
What I'd like to ask is: is there a less expensive and more elegant way to accomplish the creation of an additional event for tasks that are not closed?
Thank you in advance and best regards,
Andrew
... View more