I have the following result from as search and would like help matching the start and end dates. These are two separate arrays of dates.
start time end of start
6/1/2014 7:43 6/1/2014 8:42
6/2/2014 6:31 6/1/2014 8:52
6/3/2014 5:22 6/1/2014 9:16
6/4/2014 6:27 6/2/2014 7:30
6/5/2014 6:22 6/3/2014 6:19
6/6/2014 6:17 6/3/2014 10:33
6/7/2014 9:09 6/3/2014 18:08
6/8/2014 6:51 6/4/2014 7:05
6/9/2014 8:12 6/4/2014 7:13
6/10/2014 6:23 6/5/2014 6:58
6/14/2014 8:05 6/6/2014 6:55
The result I expect is shown below:
start time end of start
6/1/2014 7:43 6/1/2014 8:42
6/2/2014 6:31 6/2/2014 7:30
6/3/2014 5:22 6/3/2014 6:19
6/4/2014 6:27 6/4/2014 7:05
..and so forth.
Note that I skip all start dates that occur before the most recent(previous) end of start and skip any end of start that occurs before the most recent(previous) start time to build my pairs of start and end. How do i accomplish this is a Splunk search?
After some research I came up with what seems a good enough (not the best) solution for this problem. The issue was that I was framing the problem in the context of how I would solve it in an object oriented programming language and not SPL.I had two arrays of dates that I would need to loop through and create date pairs as my solution.
To reframe the problem in Splunk terms I had to combine the two arrays into one and keep them in their original chronological events and add a new column to the event to indicate whether it is a Start Time or an End Of Start event.So my list of events would now look like this:
Equipment EventList ctonline cteos
CT1 6/1/2014 7:43 StartTime
CT1 6/1/2014 8:42 End Of Start
CT1 6/1/2014 8:52 End Of Start
CT1 6/1/2014 9:16 End Of Start
CT1 6/2/2014 6:31 StartTime
CT1 6/2/2014 7:30 End Of Start
CT1 6/3/2014 5:22 StartTime
CT1 6/3/2014 6:19 End Of Start
CT1 6/3/2014 10:33 End Of Start
CT1 6/3/2014 18:08 End Of Start
CT1 6/4/2014 6:27 StartTime
CT1 6/4/2014 7:05 End Of Start
CT1 6/4/2014 7:13 End Of Start
CT1 6/5/2014 6:22 StartTime
CT1 6/5/2014 6:58 End Of Start
Given this I can then use the streamstats command to mark a valid StartTime event as one that occurs after an non-StartTime event and a valid End of Start event as one that occurs after a non-End of Start event.The result of these can be stored in two new columns as shown below:
Equipment EventList ctonline cteos prevctonline prevcteos
CT1 6/1/2014 7:43 StartTime 1 0
CT1 6/1/2014 8:42 End Of Start 0 1
CT1 6/1/2014 8:52 End Of Start 0 0
CT1 6/1/2014 9:16 End Of Start 0 0
CT1 6/2/2014 6:31 StartTime 1 0
CT1 6/2/2014 7:30 End Of Start 0 1
CT1 6/3/2014 5:22 StartTime 1 0
CT1 6/3/2014 6:19 End Of Start 0 1
CT1 6/3/2014 10:33 End Of Start 0 0
CT1 6/3/2014 18:08 End Of Start 0 0
CT1 6/4/2014 6:27 StartTime 1 0
CT1 6/4/2014 7:05 End Of Start 0 1
CT1 6/4/2014 7:13 End Of Start 0 0
CT1 6/5/2014 6:22 StartTime 1 0
CT1 6/5/2014 6:58 End Of Start 0 1
Now we can apply the transact command to group events that only fall between a valid StartTime and a valid End of Start event. Below is the code to achieve this.
...|streamstats current=f window=1 last(ctonline) as prevctonline last(cteos) as prevcteos | reverse | transaction Equipment startswith=eval(ctonline!=coalesce(prevctonline,"") and ctonline="Start") endswith=eval(cteos!=coalesce(prevcteos,"") and cteos="EndOfStart") | sort TimeStamp | timechart avg(duration) by Equipment
The result of the query above gives you the correct grouping for the Start Time and End of Start events .. without having to loop through the arrays as before.Hopefully this helps someone else.
After some research I came up with what seems a good enough (not the best) solution for this problem. The issue was that I was framing the problem in the context of how I would solve it in an object oriented programming language and not SPL.I had two arrays of dates that I would need to loop through and create date pairs as my solution.
To reframe the problem in Splunk terms I had to combine the two arrays into one and keep them in their original chronological events and add a new column to the event to indicate whether it is a Start Time or an End Of Start event.So my list of events would now look like this:
Equipment EventList ctonline cteos
CT1 6/1/2014 7:43 StartTime
CT1 6/1/2014 8:42 End Of Start
CT1 6/1/2014 8:52 End Of Start
CT1 6/1/2014 9:16 End Of Start
CT1 6/2/2014 6:31 StartTime
CT1 6/2/2014 7:30 End Of Start
CT1 6/3/2014 5:22 StartTime
CT1 6/3/2014 6:19 End Of Start
CT1 6/3/2014 10:33 End Of Start
CT1 6/3/2014 18:08 End Of Start
CT1 6/4/2014 6:27 StartTime
CT1 6/4/2014 7:05 End Of Start
CT1 6/4/2014 7:13 End Of Start
CT1 6/5/2014 6:22 StartTime
CT1 6/5/2014 6:58 End Of Start
Given this I can then use the streamstats command to mark a valid StartTime event as one that occurs after an non-StartTime event and a valid End of Start event as one that occurs after a non-End of Start event.The result of these can be stored in two new columns as shown below:
Equipment EventList ctonline cteos prevctonline prevcteos
CT1 6/1/2014 7:43 StartTime 1 0
CT1 6/1/2014 8:42 End Of Start 0 1
CT1 6/1/2014 8:52 End Of Start 0 0
CT1 6/1/2014 9:16 End Of Start 0 0
CT1 6/2/2014 6:31 StartTime 1 0
CT1 6/2/2014 7:30 End Of Start 0 1
CT1 6/3/2014 5:22 StartTime 1 0
CT1 6/3/2014 6:19 End Of Start 0 1
CT1 6/3/2014 10:33 End Of Start 0 0
CT1 6/3/2014 18:08 End Of Start 0 0
CT1 6/4/2014 6:27 StartTime 1 0
CT1 6/4/2014 7:05 End Of Start 0 1
CT1 6/4/2014 7:13 End Of Start 0 0
CT1 6/5/2014 6:22 StartTime 1 0
CT1 6/5/2014 6:58 End Of Start 0 1
Now we can apply the transact command to group events that only fall between a valid StartTime and a valid End of Start event. Below is the code to achieve this.
...|streamstats current=f window=1 last(ctonline) as prevctonline last(cteos) as prevcteos | reverse | transaction Equipment startswith=eval(ctonline!=coalesce(prevctonline,"") and ctonline="Start") endswith=eval(cteos!=coalesce(prevcteos,"") and cteos="EndOfStart") | sort TimeStamp | timechart avg(duration) by Equipment
The result of the query above gives you the correct grouping for the Start Time and End of Start events .. without having to loop through the arrays as before.Hopefully this helps someone else.
Just updated the question with a link to the sample events and the logic behind the start and end times.
Please share some raw events. It may help try other directions as well..
Ok.Let me read through those in the meantime.Maybe my approach is wrong. Give it a crack and if need be I can post the native events that I searched to get the two arrays and see if there is another easier way around the problem.
I will give this a try but it seems very complex for Splunk search language (needs while looping) and more or less asks for custom search command where you can do this looping in a python script. Some readings
http://blogs.splunk.com/2014/04/14/building-custom-search-commands-in-python-part-i-a-simple-generat...
http://blogs.splunk.com/2011/11/30/using-custom-search-commands-with-splunk-python-sdk/
No really.After picking up the first start-pair (i.e "start time"- "end of start") a new start can occur only if its after the last "end of start" , if not then we would have to walk down the "start time" rows until we hit a start time that is AFTER the last "end of start". In my current list none of the starts will be skipped as they always occur after the previous pair's "end of start". But that contrary situation my happen.
These are multivalued fields i.e multiple rows exist. The number of rows in in the "start time" field may not be necessary the same as those in the "end of start" field.In my current case I have 43 rows in "start time" and 129 in the "end of start" many of which will be skipped since they dont match my condition.
The fields "start time" and "end of start" are multivalued field or single valued fields? how many rows are we talking about here?