Splunk Search
Highlighted

Concurrent users per time bucket from transactions

Champion

The objective is take events that indicate user activity, breakdown the data into segments of time, and then figure out what segments/bins should be marked to indicate that user activity took place during that time. Here is an example of the data I am looking at

Start | End  |  Field_Name  |  Unique_Visitor_ID
 a1   |  a1  |     AAA      |      ZZZ
 a1   |  a2  |     AAA      |      YYY
 a2   |  a2  |     AAA      |      YYY
 a3   |  a4  |     AAA      |      ZZZ
 a4   |  a4  |     AAA      |      ZZZ
 a5   |  a6  |     AAA      |      YYY
 a6   |  a6  |     AAA      |      ZZZ

In the table above, "Start" and "End" values indicate the start and end time segments that define a window of time where user "UniqueVisitorID" was using resource "Field_Name". Think of it kind of like the specification of a Gantt Chart, where "Start" and "End" define the range of time windows during which a user was accessing a resource.

What we want to do is create a plot where we know how many users were using each resource during each time segment. The trick is that each event in the table can span multiple time segments, and each event is generated from a grouping of events into a transaction.

I have been able to generate a query to populate the chart like I want, but it is extremely inefficient due to how a combination of 'map' and 'gentimes' are being used. Any help on simplifying this would be extremely appreciated.

Here is the end of the query where I try to produce a chart-able result after forming the table above. Basically, put all start and end segments in increments of 15 minutes, and then run through a map/gentimes command that will break up all of the transaction events that span multiple segments into individual events that cover only a single segment.

GENERATE_TRANSACTION_TABLE 
    | fields start end FieldName UniqueID 
    | bin start span=15m 
    | bin end span=15m
    | stats max(end) as lasttime by UniqueID FieldName start  
    | stats min(start) as starttime by UniqueID FieldName lasttime 
    | stats values(UniqueID) as UniqueID by starttime lasttime FieldName  (<-- filter and group like events)
    | eval starttime=strftime(starttime, "%m/%d/%Y:%H:%M:%S") 
    | eval lasttime=lasttime+1    (<-- this is a workaround to make gentimes work for events with same start and end)
    | eval lasttime=strftime(lasttime, "%m/%d/%Y:%H:%M:%S") 
    | map maxsearches=1000 
       search="gentimes start=$starttime$ end=$lasttime$ increment=15m 
       | eval FieldName=$FieldName$ 
       | eval UniqueID=$UniqueID$"
    | fields starttime FieldName UniqueID
    | dedup starttime FieldName UniqueID 
    | makemv delim=" " UniqueID 
    | mvexpand UniqueID 
    | rename starttime as _time 
    | timechart span=15m dc(UniqueID) as Count by FieldName

Here is some example data to show the challenge. Assume this is what comes out of the transaction process.

    Start         End      Field_Name   Unique_Visitor_ID
1434355312  1434355421     AAA            ZZZ
1434355534  1434357109     AAA            ZZZ
1434357201  1434358920     AAA            ZZZ
1434362435     1434378784     BBB            YYY

This is what the same data looks like after assigning time buckets with a span of 30 minutes

         Start                End           Field_Name   Unique_Visitor_ID
06/15/2015:09:00:00    06/15/2015:09:00:00     AAA            ZZZ
06/15/2015:09:00:00    06/15/2015:09:30:00     AAA            ZZZ
06/15/2015:09:30:00    06/15/2015:10:00:00     AAA            ZZZ
06/15/2015:11:00:00    06/15/2015:15:30:00     BBB            YYY

This is what the end result would look like after calling timechart.

           _time              AAA               BBB
06/15/2015:09:00:00           1                     0 
06/15/2015:09:30:00           1                     0
06/15/2015:10:00:00           1                     0
06/15/2015:10:30:00           0                     0
06/15/2015:11:00:00           0                     1
06/15/2015:11:30:00           0                     1
06/15/2015:12:00:00           0                     1
06/15/2015:12:30:00           0                     1
06/15/2015:13:00:00           0                     1
06/15/2015:13:30:00           0                     1
06/15/2015:14:00:00           0                     1
06/15/2015:14:30:00           0                     1
06/15/2015:15:00:00           0                     1
06/15/2015:15:30:00           0                     1
06/15/2015:16:00:00           0                     0
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

Looked at search.log and I see error messages like the one below every time it says it invokes 'gentimes'. After this message appears, the time until the next cycle is almost a second as seen below.

06-16-2015 02:32:57.375 INFO DispatchThread - Error reading runtime settings: File :C:\Program Files\Splunk\var\run\splunk\dispatch\subsearch__search31434418371.2490_1434418377.5\runtime.csv does not exist
06-16-2015 02:32:57.375 INFO DispatchThread - Disk quota = 10485760000
06-16-2015 02:32:58.281 INFO script - Invoked script gentimes with 213 input bytes (0 events). Returned 2 output bytes in 891 ms.

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Esteemed Legend

Given the sample data, give me the desired output (I am not at all following what you are explaining).

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

Sorry, I hope this can help.

The following is what a segment of events looks like when the transactions have been made

    start               end             FieldName       UniqueID        
1434283638.50   1434283944.90   Resource_AA     USER_YY
1434284009.43   1434284172.20   Resource_AA     USER_YY
1434284178.02   1434284240.24   Resource_AA     USER_YY
1434353495.28   1434353616.58   Resource_BB     USER_XX
1434353671.64   1434353753.25   Resource_BB     USER_XX
1434353833.93   1434353872.11   Resource_BB     USER_ZZ
1434353852.65   1434353868.99   Resource_AA     USER_ZZ
1434353861.75   1434353884.71   Resource_BB     USER_ZZ
1434353852.60   1434353931.20   Resource_AA     USER_ZZ
1434353902.17   1434353915.57   Resource_BB     USER_ZZ
1434353907.88   1434354047.3          Resource_AA       USER_ZZ
1434354040.52   1434354077.74   Resource_BB     USER_XX 

This is what that series of data should look after the call to 'dedup'

     start              end                   FieldName         UniqueID
06/14/2015:13:00:00     06/14/2015:13:00:00     Resource_AA     USER_YY
06/14/2015:13:00:00     06/14/2015:13:15:00     Resource_AA     USER_YY
06/14/2015:13:15:00     06/14/2015:13:15:00     Resource_AA     USER_YY
06/15/2015:08:30:00     06/15/2015:08:30:00     Resource_BB     USER_XX
06/15/2015:08:30:00     06/15/2015:08:30:00     Resource_BB     USER_ZZ
06/15/2015:08:30:00     06/15/2015:08:30:00     Resource_AA     USER_ZZ

This is what the end result would look like (the unique user count for each resource in a given time bucket)

_time                    Resource_AA      Resource_BB
06/14/2015:13:00:00             1                 0
06/14/2015:13:15:00             1                 0
06/14/2015:08:30:00             1                 2
0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

Basically, distil the data down to show unique user count per resource per time bucket. The trick is each time bucket can have multiple complete transactions for a single resource from the same user. I can't really change that fact.

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Legend

[Answer edited to correct some typos and where I used wrong field names]

First, I have not used map and gentimes much, but I think this is what you want:

| map maxsearches=1000 
      search="gentimes start=$start_time$ end=$end_time$ increment=15m 
     | fields starttime endtime $FieldName$ $UnqiueID$" 

I have played around with this, and I think that the mvcombine and mvexpand could be a problem - but I would need to see the output of your GENERATE_TRANSACTION_TABLE to figure it out.

I would probably do it this way:

GENERATE_TRANSACTION_TABLE 
| fields start end FieldName UniqueID  
| dedup start end FieldName UniqueID 
| bin start span=15m 
| bin end span=15m
| sort UniqueID FieldName start
| streamstats current=f window=2 earliest(timestamp) as last_time by UniqueID FieldName
| eval timestamp=if(isnotnull(last_time),relative(last_time,"+15m"),start)
| rename timestamp as _time 
| timechart span=15m dc(UniqueID) as Count by FieldName

Finally, I don't know how you are generating your transactions, but it will almost certainly be faster if you can use the stats command rather than the transaction command. If you can. And the table command (on line 2) is not accomplishing anything for you in your original search - if you are trying to optimize field extraction, use fields instead - or just leave it out.

Highlighted

Re: Concurrent users per time bucket from transactions

Champion

I appreciate your help and the suggestions.

  • The comment about table versus fields is correct, thank you for that.
  • The transactions cannot be generated using stats. Also, that is not the bottleneck in my testing, it is always gentimes
  • Your suggestion for changing gentimes is not helping speed it up. In fact, if I trim it down to just run gentimes start=$start$ end=$end$ increment=15m, it still takes nearly a second for each invocation (see comment above)
  • I am still testing out your suggested approach. Will come back with results in a bit.
0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

Can you confirm the streamstats command in your example? Are you sure you meant to use the field "timestamp" in the call to earliest? That field does not yet exist at this point in the pipeline.

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

Playing around with what you provided seems to not quite get there. If you look at the original posting, I have added some sample data that should show the extent of the issue.

0 Karma

Re: Concurrent users per time bucket from transactions

Legend
| streamstats current=f window=1 earliest(timestamp) as last_time by UniqueID FieldName

Try changing the window to 1

And yes, I meant to use the field timestamp - it won't exist for the first event, but it will for subsequent events. Note that I test for a null value of last_time in the following command.

Sorry about the typos in the original answer, I have updated it. My only excuse is that I had minor surgery the other day, so I was taking pain meds. 🙂

0 Karma