Splunk Search

Retrieving duration between two events with multiple end events

handosplunk2
Observer

Hello all,

I'm trying to get a duration between the first "started" event, and the first "connected" event following started, grouped by each user id.

 

The Data

I'm trying to get an event that is going to be structured like the following (assume these have all have real timestamps. I am abbreviating it to be short. The item numbers on the left are for annotation purposes only)

(item no. for annotation purposes only)userIdstatus_time (abbreviated)
01started00:00
11connected00:05
22started00:30
32connected00:40
42connected01:30
54started02:00
63connected02:05
73started02:10
83connected02:20
94connected02:30
105started3:00



What i'm looking to achieve:

A) I need to make sure i start the clock whenever the user has a "started" state.  (e.g., item no. 6 should be neglected)
B) It must take the first connected event following "started". (e.g., item no. 3 is the end item, with item no.4 being ignored completely)
C) I want to graph the number of users bucketed by intervals of 15 seconds.
D) There must be a start and connected event. (e.g. userId 5 would not be added)

How would i approach this?  I tried to do the following:

... status="started" OR status="connected" 
| stats range(_time) AS duration BY userId
| where duration > 0
| bin span 15 duration
| stats dc(userid) as Users by duration

 

But this isn't  quite doing what I want it to do.  And, I also get events where there's no duration.

Labels (5)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Using streamstats is the more powerful solution and as @PickleRick says it can handle the case where you have multiple started and connected events for the same user. (Think reset_before=).

The issue you need to consider is your data volume. transaction is not good with large data volumes and long spans and will not easily handle the multiple connected events and streamstats needs to move all the data to the search head.

There is a 3rd solution using stats but it would need some good knowledge of playing around with multivalued fields.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

This is better achieved with transaction.

| transaction userId startswith="status=started" endswith="status=connected"

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

While transaction can be indeed a more intuitive solution, similar solution can be probably achieved with streamstats.

First "split" your times to have it as

| eval starttime=if(status=="started",_time,null())
| eval endtime=if(status=="connected",_time,null())

Then you can "collect" your times with

| streamstats min(starttime) as timestarted min(endtime) as timeended by userId

If you now want to remove the "extra" events with "connected" status, you can do

| where NOT (status=="connected" AND _time!=timeended)

Bonus - you can use the same approach if there is a possibility of multiple "started" events (we'll leave it as an exercise for the reaer ;-))

Now you're left with just one "started" event and one "connected" event.

So we're down to your last requirement - ther must be both "start" event and "connected" event. This can be done in various ways but we can do it along our duration calculation

| stats range(_time) as duration values(state) as states by userId

And we only want to keep those results which have both states

| where states="started" AND states="connected"

Now you're all set to group, summarize, bin, and do whatever you want with your results.

To be quite honest, I'm not sure what you mean by your C) requirement. Because if it does what I think may mean, the above search is completely not what you want 😄

0 Karma

handosplunk2
Observer

Will this also dedup start and ending transactions as well?

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...