Solved: Timecharting the sum of a max value in a field.

cvervais · ‎04-28-2014

I think I have the hard part of this figured out but, I'm struggling with how to send it into time chart. Here's the search base:

index=main module=api extra.user_id=* extra.endpoint=set_playback_position NOT (extra.args.position="-1" OR extra.args.position="-3" OR extra.args.position="-2" OR Cohort=INT) |bucket _time span=1d |dedup extra.args.url extra.user_id date_mday|stats max(extra.args.position) by extra.args.url extra.user_id

When I run that, I get a nice set of stats showing the max value for extra.args.position for extra.user_id and extra.args.url. And, if I tack on stats sum it nicely sums up the max values of extra.args.position.

What I'm looking to do is put this on a column timechart where the height of the column is the sum of max extra.args.position. With each column showing a day or if I want to go further a week which I think I can control by using bucket and span. Bonus points for making each column a stacked column where the components of the stack are the sum of extra.args.position for a user_id.

martin_mueller · ‎04-28-2014

I don't see any evidence of max'ing in that example results, so that'd be this:

base search | timechart span=1d sum(position)

That's easily extensible to this:

base search | timechart span=1d sum(position) by URL

URLs will each get their own colour - it'll get messy if you want to split by user as well.
That can be done of course:

base search | eval user_url = user." - ".URL | timechart span=1d sum(position) by user_url

That'll give you a huge number of colours, and probably won't be usable.

I've changed your field names due to laziness 😛

Edit: This appears to be closest to the truth:

base search | eval day = relative_time(_time, "@d") | stats latest(_time) as _time max(position) as position by user url day | timechart span=1d avg(position) by url

View solution in original post

martin_mueller · ‎04-28-2014

I don't see any evidence of max'ing in that example results, so that'd be this:

base search | timechart span=1d sum(position)

That's easily extensible to this:

base search | timechart span=1d sum(position) by URL

URLs will each get their own colour - it'll get messy if you want to split by user as well.
That can be done of course:

base search | eval user_url = user." - ".URL | timechart span=1d sum(position) by user_url

That'll give you a huge number of colours, and probably won't be usable.

I've changed your field names due to laziness 😛

Edit: This appears to be closest to the truth:

base search | eval day = relative_time(_time, "@d") | stats latest(_time) as _time max(position) as position by user url day | timechart span=1d avg(position) by url

martin_mueller · ‎04-29-2014

Divide it by 60 then.

... | eval field = field / 60

cvervais · ‎04-29-2014

Not sure per_minute() is what I'm after. per_minute() seems to be more about how many events per minute. What I'm interested in is taking the value of position and dividing it by 60.

Also, can you edit your answer to put the winning search in it so I can up vote it and mark it answered?

martin_mueller · ‎04-29-2014

Take a look at the per_minute() functions: http://docs.splunk.com/Documentation/Splunk/6.0.3/SearchReference/CommonStatsFunctions

I'm not driving so keep 'em coming 🙂

cvervais · ‎04-29-2014

If I want I can break it down by user_id in the time chart which is nice. Dumb question though, is there a way on the time chart to convert the vertical axis to minutes by dividing the result by 60?

I'm going to go googling to try and find out too. I'm also going to see if anyone I know at Splunk is going to Hamburg and sending them with some euros to buy you a couple of pints.

cvervais · ‎04-29-2014

That gave me enough to go on and got me this:

index=main module=api hostname=PROD extra.user_id= extra.endpoint=set_playback_position NOT (extra.args.position="-1" OR extra.args.position="-3" OR extra.args.position="-2" OR extra.args.position=0 OR Cohort=INT OR Cohort=PB*) |eval day = relative_time(_time, "@d") | stats latest(_time) as _time max(extra.args.position) as position by extra.user_id extra.args.url day | timechart span=1d sum(position)

If I did it right, rather than take the average of the maximums, I'm adding them together to get the total amount of video watched.

martin_mueller · ‎04-28-2014

Ah. Something like this?

base search | eval day = relative_time(_time, "@d") | stats latest(_time) as _time max(position) as position by user url day | timechart span=1d avg(position) by url

I'm counting each user once per day per url, getting their maximum position for that. Then I'm calculating the average of those maximums for each url per day. Compared to the length of a video it should show how far into the video users watch.

cvervais · ‎04-28-2014

I'm a fan of lazy myself 🙂
Yup, those examples I can get. The reason I'm not looking to sum the position is because the position represents how many seconds into the video playback the current position is. Since we log that every 10 seconds, if I sum it I'll have a result that is longer than the video it's self.

That's why I was going with max. The maximum value of the position for that video will represent the duration watched.

Since what I'm trying to get at and show is how much video the users are watching on a user basis and not necessarily the videos themselves.

cvervais · ‎04-28-2014

Ah, thank you, that helped. Based on your table in your reply, I'd expect a chart with the horizontal axis being time and the vertical axis showing the calculated sum. So, in the chart there'd be a column for 04-28 that rises up to the value of 45. There'd be another column for 04-28 rising up to 47.

I think that's what you're asking.

martin_mueller · ‎04-28-2014

Yes. I can't make out clear computational rules from your text, so an example might help.

cvervais · ‎04-28-2014

Sorry about that, I'm confused. Are you asking for the example you gave what I would expect the chart would look like?

martin_mueller · ‎04-28-2014

That doesn't appear to match the data I posted. On top of imagining it's just a single user johndoe, also imagine that data is for a single URL http://example.com.

cvervais · ‎04-28-2014

Something like this:

timestamp extra.args.url max(extra.args.position) 04-28 http://some.video 60 04-28 http://some.other.video 34

martin_mueller · ‎04-28-2014

Imagine the data I posted above were for user johndoe - what should the output be?

cvervais · ‎04-28-2014

Maybe the easiest way to explain what I'm trying to get to is for each user, the sum of the highest recorded extra.playback.position for each extra.args.url. Then I can stack that into a column for each day, week, month, etc.

cvervais · ‎04-28-2014

Hi Martin, I get the feeling I'm going to owe you another pint. I was using bucket just to see if I could graph with things bucketed into daily buckets, I forgot to modify the dedup after I did that. With my limited (so far) knowledge of splunk, I put the dedup in to make sure i was getting the max value for that user_id and url combination. We log the position every 10 seconds, if I don't dedup and include _time I'm only getting the max value of position for that time period rather than the max value for that user_id url combination.

martin_mueller · ‎04-28-2014

I'm a bit confused by your use of bucket, dedup date_mday, and stats.

The bucket floors each _time to the start of that day, why are you using date_mday instead of this for dedup?
Why are you using dedup at all, shouldn't the max() make sure you're not getting duplicates in the end?
Finally, your stats is throwing out any _time info, making a timechart after that impossible.

Say you have this:

timestamp   position
04-28 08:00       10
04-28 10:00       20
04-28 14:00       15
04-29 11:00        5
04-29 13:00       42

What should the output be?

Timecharting the sum of a max value in a field.

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

Timecharting the sum of a max value in a field.

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...