Solved: How to distribute an event among many time buckets...

nadid · ‎03-18-2015

I have a group of entries that has start_time, end_time , duration and name. Some of them are concurrent
some of them not.

I would like to create a table that has time buckets of 5 seconds and shows the duration of each element in each time bucket.

For instance, for the following values:



start_time=0 end_time=10 duration=10 name=test1

start_time=3 end_time=15 duration=12 name=test1

start_time=5 end_time=8  duration=3 name=test2

start_time=9 end_time=11 duration=2 name=test2

will create a table like this:



Bucket(time)    test1(list(duration in bucket)) test2(list(duration in bucket))

_time=0             5,2                       0,0

_time=5             5,5                       3,1

_time=10            0,5                       0,1

_time=15            0,0                       0,0

TOTALS =             10,12                       3,2

or even better using the max function by bucket.



Bucket(time)    test1(max(duration in bucket))  test2(max(duration in bucket))

_time=0                 5                     0

_time=5                 5                     3

_time=10                5                     1

_time=15                0                     0

I believe that the key is somewhere with the following functions:
|timechart span=5s list(duration) as dur |where dur >0
or
|timechart span=5s list(duration) by name
(where column name value >0)

But I fail to see how to divide the events in the different buckets since i only have one event and the duration and not multiple events that falls in each bucket. Can somebody help me?

sideview · ‎03-20-2015

One extra nuance, is that if an event has a duration greater than 10 seconds, it will be entirely missing from at least one of your 5second buckets, so you have to somehow avoid it not being counted for that bucket.

This sort of thing is fixable by using mvrange() and mvexpand, to fabricate extra copies of the events to fill all the spaces.

Skipping to the end, here's a full search that I think will do what you need. Again I'm assuming that you have start_time, end_time, duration and name on each event. Also that start_time and end_time are epochtime values, and duration is measured in seconds.

<your search terms>
| eval bucket_start=mvrange(start_time,end_time,5) 
| mvexpand bucket_start
| bin bucket_start span=5s
| eval duration_correction1=max(bucket_start-start_time,0)
| eval duration_correction2=max(end_time-bucket_start-5,0)
| eval duration_within_bucket=duration - duration_correction1 - duration_correction2
| stats max(duration_within_bucket) as duration by name

Walking through this, we make a field called bucket_start, and use mvrange() to assign it a multivalue value. If start_time were 1002, and end_time were 1018, this would be 1002,1007,1012,1017. Now say for a given event, it has N multivalue values for start_time. We then immediately use mvexpand to turn each of our events into N copies of the event, each with one particular value for start_time. (Weird stuff but useful for a variety of advanced concurrency cases.)

Next up, at this point we've created lots of little sets of events, but they're not all aligned on the same 0,5,10,15 fenceposts. So we can use the bin command to round our bucket_start values down to the nearest 5 seconds.

The next challenge is that for each of our 5 second buckets, we have to calculate, for that value of name, how much of the total duration was actually in that given 5 second bucket. This is easy to work out on pencil and paper and it amounts to snipping off the components of the duration that fall outside our bucket. The one that falls earlier than our bucket is duration_correction1 and the seconds that fall later than our bucket are in duration_correction2. Once we subtract those off we'll have "duration_within_bucket" that is actually accurate.

And then we're on the homestretch cause | stats max(duration_within_bucket) as duration by name will roll it all up and give you what you need. I think.

View solution in original post

sideview · ‎03-20-2015

One extra nuance, is that if an event has a duration greater than 10 seconds, it will be entirely missing from at least one of your 5second buckets, so you have to somehow avoid it not being counted for that bucket.

This sort of thing is fixable by using mvrange() and mvexpand, to fabricate extra copies of the events to fill all the spaces.

Skipping to the end, here's a full search that I think will do what you need. Again I'm assuming that you have start_time, end_time, duration and name on each event. Also that start_time and end_time are epochtime values, and duration is measured in seconds.

<your search terms>
| eval bucket_start=mvrange(start_time,end_time,5) 
| mvexpand bucket_start
| bin bucket_start span=5s
| eval duration_correction1=max(bucket_start-start_time,0)
| eval duration_correction2=max(end_time-bucket_start-5,0)
| eval duration_within_bucket=duration - duration_correction1 - duration_correction2
| stats max(duration_within_bucket) as duration by name

Walking through this, we make a field called bucket_start, and use mvrange() to assign it a multivalue value. If start_time were 1002, and end_time were 1018, this would be 1002,1007,1012,1017. Now say for a given event, it has N multivalue values for start_time. We then immediately use mvexpand to turn each of our events into N copies of the event, each with one particular value for start_time. (Weird stuff but useful for a variety of advanced concurrency cases.)

Next up, at this point we've created lots of little sets of events, but they're not all aligned on the same 0,5,10,15 fenceposts. So we can use the bin command to round our bucket_start values down to the nearest 5 seconds.

The next challenge is that for each of our 5 second buckets, we have to calculate, for that value of name, how much of the total duration was actually in that given 5 second bucket. This is easy to work out on pencil and paper and it amounts to snipping off the components of the duration that fall outside our bucket. The one that falls earlier than our bucket is duration_correction1 and the seconds that fall later than our bucket are in duration_correction2. Once we subtract those off we'll have "duration_within_bucket" that is actually accurate.

And then we're on the homestretch cause | stats max(duration_within_bucket) as duration by name will roll it all up and give you what you need. I think.

nadid · ‎03-23-2015

Dear @sideview, your reply bring me to the good path but still i'm not in the final result with it. What i've done until now is the following:

|table ELEMENT_NAME ELEMENT_ALIAS DESCRIPTION Duration START_TIME END_TIME
|eval bucket_time_size=1800
|eval start_bucket_time=START_TIME-(START_TIME%bucket_time_size)
|eval end_bucket_time=END_TIME-(END_TIME%bucket_time_size)
|eval bucket_start=mvrange(start_bucket_time,end_bucket_time,bucket_time_size)
|mvexpand bucket_start
|eval _time=bucket_start
|eval duration_bucket=if(_time<=START_TIME,if(_time+bucket_time_size>END_TIME,END_TIME-START_TIME,(_time+bucket_time_size)-START_TIME),if(_time+bucket_time_size>END_TIME,END_TIME-_time,bucket_time_size))
|eval availability=duration_bucket/bucket_time_size
|timechart span=1800s avg(availability) as duration_bucket by ELEMENT_ALIAS
|fillnull value=1

The first think is I define the bucket in a variable(so i can change it whenever i want) . The second thing is to align all the buckets to the same time, in this way is easier to compare if two events were in the same "bucket" . I do the same with the end bucket. Aligning the buckets is important. Otherwise you can't compare different elements and you can't really see what is in "parallel".

sideview · ‎03-24-2015

nadid - my answer was already aligning the buckets, using the bin command. That was rounding all the bucket starts and ends to the same increments.

Also there is no need to assign the new times to "_time" to calculate concurrency, unless you want to use the concurrency command, and here we do not. Using _time instead of a more descriptive variable name makes the calculations harder to follow.

Your calculatoin for duration_bucket seems to be a little off, at least by my pencil and paper math. The way I was calculating it seems cleaner and without all the "if" statements. | eval duration_within_bucket=duration - duration_correction1 - duration_correction2

Lastly, I think I see what you're trying to do when you create the availability field, but taking the average of this field is a little bit unexpected based on your original question. This makes me think that maybe the reason you didn't like my answer was because I didn't really know what you are trying to get in the end.

My feeling is that maybe when you tried to rename all the fields in my answer to match your actual fields, one got missed somehow and you started changing structural parts around. I hope this helps.

tbroberg · ‎11-24-2017

The original post was very close, but was often giving me too few buckets.

For example, with 15s starting at time 2s, I want 4 buckets: [2,5), [5,10), [10,15), [15-17). The code as written gave me 3.

For my 1 minute buckets, the addition below resolved the problem.

| eval bucket_start=mvrange(relative_time(start_time, "-0m@m"),end_time,60)

nadid · ‎03-23-2015

Then we do the magic of mvrange with the new created variables. (Thanks for this command, is really useful!)
And after i expand the events to have one event for each bucket. Since i want to see the concurrent elements, I assign the new times to _time.
Now we have all the events ordered by _time that are aligned in buckets.
For each bucket now we calculate the duration of each event. Basically there we see how much is occupied the buckets by this event.
And finally i try to get the percentage of availability of the duration on the bucket. And finally for each range we do the avg or the max function of the availability and we fill the table with 1s.

nadid · ‎03-24-2015

What i get from this is, if you want to measure the concurrency of elements with a duration you should do:
* Align the buckets to a periods that you like

 |eval bucket_time_size=1800
 |eval start_bucket_time=START_TIME-(START_TIME%bucket_time_size)
 |eval end_bucket_time=END_TIME-(END_TIME%bucket_time_size)

Create the ranges in a variable of the events for the bucket size you defined

 |eval bucket_start=mvrange(start_bucket_time,end_bucket_time,bucket_time_size)

Expand the variable (and the events)

 |mvexpand bucket_start

Assign the _time to the new expanded field.

 |eval _time=bucket_start

With that you will have what are the concurrent elements in a certain frame of time. Calculating how much is really used in each bucket will be like the calculation of in the previous answer of duration_bucket.

emiller42 · ‎03-20-2015

So you can calculate concurrency, but that's not exactly what you're asking for I think.

... | concurrency duration=duration start=start_time output=concurrent

This will add a field to each event showing the number of concurrent events as of that events timestamp. You can then pipe that to timechart and see concurrency over time.

nadid · ‎03-23-2015

The concurrency function tells me how many concurrent elements are, but does not really allows me to check the duration of the concurrent elements. I've tried to play with it but i was not able to obtain anything suitable. In any case i really appreciate the answer 🙂

satishsdange · ‎03-20-2015

timestamp is missing in raw events.

nadid · ‎03-20-2015

Dear @satishdange,

I was assuming that one of the first commands would be |eval _time=start_time|timechart span=5s list(duration) as dur |where dur >0
In this way _time is defined.
Digging in answers.splunk.com I've found another question really similar http://answers.splunk.com/answers/122260/timechart-of-event-with-duration.html that ask something like : Restaurant have log on every seat when customer sit down and stands up. Every customer uses different time when they are eating. I want know how many customer we have in specific 15min timewindow in graph.
(at the end is really similar question).

How to distribute an event among many time buckets taking into account the duration of the event?

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you a member of the Splunk Community?

How to distribute an event among many time buckets taking into account the duration of the event?

Can’t make it to .conf25? Join us online!

Community Content Calendar, September edition

Splunkbase Unveils New App Listing Management Public Preview

Leveraging Automated Threat Analysis Across the Splunk Ecosystem