Solved: Concurrent users per time bucket from transactions - Page 2

rjthibod · ‎06-15-2015

The objective is take events that indicate user activity, breakdown the data into segments of time, and then figure out what segments/bins should be marked to indicate that user activity took place during that time. Here is an example of the data I am looking at

Start | End  |  Field_Name  |  Unique_Visitor_ID
 a1   |  a1  |     AAA      |      ZZZ
 a1   |  a2  |     AAA      |      YYY
 a2   |  a2  |     AAA      |      YYY
 a3   |  a4  |     AAA      |      ZZZ
 a4   |  a4  |     AAA      |      ZZZ
 a5   |  a6  |     AAA      |      YYY
 a6   |  a6  |     AAA      |      ZZZ

In the table above, "Start" and "End" values indicate the start and end time segments that define a window of time where user "Unique_Visitor_ID" was using resource "Field_Name". Think of it kind of like the specification of a Gantt Chart, where "Start" and "End" define the range of time windows during which a user was accessing a resource.

What we want to do is create a plot where we know how many users were using each resource during each time segment. The trick is that each event in the table can span multiple time segments, and each event is generated from a grouping of events into a transaction.

I have been able to generate a query to populate the chart like I want, but it is extremely inefficient due to how a combination of 'map' and 'gentimes' are being used. Any help on simplifying this would be extremely appreciated.

Here is the end of the query where I try to produce a chart-able result after forming the table above. Basically, put all start and end segments in increments of 15 minutes, and then run through a map/gentimes command that will break up all of the transaction events that span multiple segments into individual events that cover only a single segment.

GENERATE_TRANSACTION_TABLE 
    | fields start end FieldName UniqueID 
    | bin start span=15m 
    | bin end span=15m
    | stats max(end) as lasttime by UniqueID FieldName start  
    | stats min(start) as starttime by UniqueID FieldName lasttime 
    | stats values(UniqueID) as UniqueID by starttime lasttime FieldName  (<-- filter and group like events)
    | eval starttime=strftime(starttime, "%m/%d/%Y:%H:%M:%S") 
    | eval lasttime=lasttime+1    (<-- this is a workaround to make gentimes work for events with same start and end)
    | eval lasttime=strftime(lasttime, "%m/%d/%Y:%H:%M:%S") 
    | map maxsearches=1000 
       search="gentimes start=$starttime$ end=$lasttime$ increment=15m 
       | eval FieldName=$FieldName$ 
       | eval UniqueID=$UniqueID$"
    | fields starttime FieldName UniqueID
    | dedup starttime FieldName UniqueID 
    | makemv delim=" " UniqueID 
    | mvexpand UniqueID 
    | rename starttime as _time 
    | timechart span=15m dc(UniqueID) as Count by FieldName

Here is some example data to show the challenge. Assume this is what comes out of the transaction process.

    Start         End      Field_Name   Unique_Visitor_ID
1434355312  1434355421     AAA            ZZZ
1434355534  1434357109     AAA            ZZZ
1434357201  1434358920     AAA            ZZZ
1434362435     1434378784     BBB            YYY

This is what the same data looks like after assigning time buckets with a span of 30 minutes

         Start                End           Field_Name   Unique_Visitor_ID
06/15/2015:09:00:00    06/15/2015:09:00:00     AAA            ZZZ
06/15/2015:09:00:00    06/15/2015:09:30:00     AAA            ZZZ
06/15/2015:09:30:00    06/15/2015:10:00:00     AAA            ZZZ
06/15/2015:11:00:00    06/15/2015:15:30:00     BBB            YYY

This is what the end result would look like after calling timechart.

           _time              AAA               BBB
06/15/2015:09:00:00           1                     0 
06/15/2015:09:30:00           1                     0
06/15/2015:10:00:00           1                     0
06/15/2015:10:30:00           0                     0
06/15/2015:11:00:00           0                     1
06/15/2015:11:30:00           0                     1
06/15/2015:12:00:00           0                     1
06/15/2015:12:30:00           0                     1
06/15/2015:13:00:00           0                     1
06/15/2015:13:30:00           0                     1
06/15/2015:14:00:00           0                     1
06/15/2015:14:30:00           0                     1
06/15/2015:15:00:00           0                     1
06/15/2015:15:30:00           0                     1
06/15/2015:16:00:00           0                     0

rjthibod · ‎10-05-2016

This is the more generic approach that is used heavily in one of my apps. It will split the time ranges marked by Start and End and convert them into 1-minute (60 second) time buckets with the duration2 field indicating how much time in the time bucket the event occupied. Assume Start and End are time as Epoch milliseconds.

Note: @sideview was very close and much more detailed in terms of options for simpler cases (why I upvoted him). My case was more complicated and required the most generic approach I could come up with, i.e., transaction and concurrency couldn't handle it.

 BASE SEARCH ... 
 | eval earliest=Start
 | eval latest=End
 | eval duration = latest - earliest
 | eval start_time_min = floor(earliest - (earliest % 60))
 | eval end_time_min   = floor(latest - (latest % 60))
 | eval time_bins = mvrange(start_time_min, end_time_min + 1, 60)
 | mvexpand time_bins 
 | eval duration2 = if(start_time_min == end_time_min, duration, if(start_time_min == time_bins, round(start_time_min + 60 - earliest, 3), if(end_time_min == time_bins, round(latest - end_time_min, 3), 60))) 
 | rename time_bins as _time
 | table _time duration2 Field_Name Unique_Visitor_ID
 | eval _span = 60
 | ... do stats or whatever you need

View solution in original post

Concurrent users per time bucket from transactions

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes