To start - I was suggested this solution, but despite the fact that the question is very similar the answer marked as a solution doesn't seem to actually provide the quantitative total that I am looking for.
I have a series of events where there is a Start and Stop time, in epoch time. These events can be grouped by a common field, `host`, and I am trying to determine the total amount of deduplicated time that these events span.
For example:
The total time for Host_1 would therefore be 50 minutes:
I had tried to leverage streamstats to get information about previous events, but couldn't work out how to get it to properly reset when the events didn't overlap.
Not even sure streamstats is the best method for solving this type of problem.
EDIT: some test data may be helpful.
0,"hostname","start_time","end_time"
1,"host_1","1654130041.626307","1654130566.626307"
2,"host_1","1654131696.975800","1654133451.975800"
3,"host_1","1654132454.687189","1654134263.687189"
4,"host_1","1654132747.975800","1654133451.975800"
5,"host_1","1654133805.740912","1654134236.740912"
6,"host_1","1654136688.170093","1654136722.170093"
7,"host_1","1654136782.300892","1654136818.300892"
8,"host_1","1654136885.031861","1654137288.031861"
9,"host_1","1654137388.801936","1654139394.801936"
Doing the math, rows numbered 3 and 4 both have `start_time` values that are earlier than row 1's `end_time` value - indicating that there would be a duration overlap occurring in several rows.
| eval event=mvrange(1,3)
| mvexpand event
| eval time=if(event=1,start,end)
| sort 0 time
| eval event=if(event=1,event,-1)
| streamstats sum(event) as concurrent by host
| eval start=if(concurrent=1 AND event=1,start,null())
| eval end=if(concurrent=0 AND event=-1,end,null())
| streamstats last(start) as start by host
| eval duration=if(concurrent=0,end-start,null())
| stats sum(duration) as duration by host
| eval duration=tostring(duration,"duration")
But if you have start and end at each event what more you need?
I'm not sure I follow. Are you suggesting to simply export the data from Splunk to perform the calculations on total time elsewhere?
"What more I need" - I want to be able to calculate the sum of all deduplicated/non-overlapping time by host. If you'd like, I can try to provide some example data to work with.
No, I mean that I was simply surprised at the simplicity since you have all the data needed for a simple
<your search>
| eval duration=end_time-start_time
| stats sum(duration)
But it seems I missed (or rather misinterpreted) the overlapping part. (I thought that you didn't care about the overlap so it really seemed too simple). Hence my reaction.
But to the point, I'd try something similar to what @ITWhisperer showed but from a slightly different angle.
Sorry, don't have my splunk installation at hand so can't provide exact solution, just a rough outline.
Firstly split the event into separate start/end events using untable.
Then you sort the events by time.
Set a field (let's call it optype) to 1 if it's a start event or -1 if it's a end event.
Streamstats sum(optype) as filter
Now you have a filter field - you're only interested in "balanced" values for this field so filter using where to get only those events which are starts with filter=1 or ends with filter=0. Other - "unbalanced" - values are the "internal" starts/ends.
Now you only have to combine them back into start/end pairs. For example by streamstats last(time) - this will give you start/end of a "same state" period. Now you only have to do "where" to only get the "end" events (with corresponding starts of course.
And you have "deduplicated" periods for duration calculation and stats(sum).
I hope it makes sense. I might be able to turn this into spl later.
untable (as @PickleRick suggested) works well for doubling up the events too
| makeresults
| eval _raw="0,hostname,start_time,end_time
1,host_1,1654130041.626307,1654130566.626307
2,host_1,1654131696.975800,1654133451.975800
3,host_1,1654132454.687189,1654134263.687189
4,host_1,1654132747.975800,1654133451.975800
5,host_1,1654133805.740912,1654134236.740912
6,host_1,1654136688.170093,1654136722.170093
7,host_1,1654136782.300892,1654136818.300892
8,host_1,1654136885.031861,1654137288.031861
9,host_1,1654137388.801936,1654139394.801936"
| multikv forceheader=1
| table hostname start_time end_time
| untable hostname event time
| sort 0 time
| eval event=if(event="start_time",1,-1)
| streamstats sum(event) as concurrent by hostname
| eval start_time=if(concurrent=1 AND event=1,time,null())
| eval end_time=if(concurrent=0 AND event=-1,time,null())
| streamstats last(start_time) as start_time by hostname
| eval duration=if(concurrent=0,end_time-start_time,null())
| stats sum(duration) as duration by hostname
| eval duration=tostring(duration,"duration")