The objective is take events that indicate user activity, breakdown the data into segments of time, and then figure out what segments/bins should be marked to indicate that user activity took place during that time. Here is an example of the data I am looking at
Start | End | Field_Name | Unique_Visitor_ID a1 | a1 | AAA | ZZZ a1 | a2 | AAA | YYY a2 | a2 | AAA | YYY a3 | a4 | AAA | ZZZ a4 | a4 | AAA | ZZZ a5 | a6 | AAA | YYY a6 | a6 | AAA | ZZZ
In the table above, "Start" and "End" values indicate the start and end time segments that define a window of time where user "UniqueVisitorID" was using resource "Field_Name". Think of it kind of like the specification of a Gantt Chart, where "Start" and "End" define the range of time windows during which a user was accessing a resource.
What we want to do is create a plot where we know how many users were using each resource during each time segment. The trick is that each event in the table can span multiple time segments, and each event is generated from a grouping of events into a transaction.
I have been able to generate a query to populate the chart like I want, but it is extremely inefficient due to how a combination of 'map' and 'gentimes' are being used. Any help on simplifying this would be extremely appreciated.
Here is the end of the query where I try to produce a chart-able result after forming the table above. Basically, put all start and end segments in increments of 15 minutes, and then run through a map/gentimes command that will break up all of the transaction events that span multiple segments into individual events that cover only a single segment.
GENERATE_TRANSACTION_TABLE | fields start end FieldName UniqueID | bin start span=15m | bin end span=15m | stats max(end) as lasttime by UniqueID FieldName start | stats min(start) as starttime by UniqueID FieldName lasttime | stats values(UniqueID) as UniqueID by starttime lasttime FieldName (<-- filter and group like events) | eval starttime=strftime(starttime, "%m/%d/%Y:%H:%M:%S") | eval lasttime=lasttime+1 (<-- this is a workaround to make gentimes work for events with same start and end) | eval lasttime=strftime(lasttime, "%m/%d/%Y:%H:%M:%S") | map maxsearches=1000 search="gentimes start=$starttime$ end=$lasttime$ increment=15m | eval FieldName=$FieldName$ | eval UniqueID=$UniqueID$" | fields starttime FieldName UniqueID | dedup starttime FieldName UniqueID | makemv delim=" " UniqueID | mvexpand UniqueID | rename starttime as _time | timechart span=15m dc(UniqueID) as Count by FieldName
Here is some example data to show the challenge. Assume this is what comes out of the transaction process.
Start End Field_Name Unique_Visitor_ID 1434355312 1434355421 AAA ZZZ 1434355534 1434357109 AAA ZZZ 1434357201 1434358920 AAA ZZZ 1434362435 1434378784 BBB YYY
This is what the same data looks like after assigning time buckets with a span of 30 minutes
Start End Field_Name Unique_Visitor_ID 06/15/2015:09:00:00 06/15/2015:09:00:00 AAA ZZZ 06/15/2015:09:00:00 06/15/2015:09:30:00 AAA ZZZ 06/15/2015:09:30:00 06/15/2015:10:00:00 AAA ZZZ 06/15/2015:11:00:00 06/15/2015:15:30:00 BBB YYY
This is what the end result would look like after calling
_time AAA BBB 06/15/2015:09:00:00 1 0 06/15/2015:09:30:00 1 0 06/15/2015:10:00:00 1 0 06/15/2015:10:30:00 0 0 06/15/2015:11:00:00 0 1 06/15/2015:11:30:00 0 1 06/15/2015:12:00:00 0 1 06/15/2015:12:30:00 0 1 06/15/2015:13:00:00 0 1 06/15/2015:13:30:00 0 1 06/15/2015:14:00:00 0 1 06/15/2015:14:30:00 0 1 06/15/2015:15:00:00 0 1 06/15/2015:15:30:00 0 1 06/15/2015:16:00:00 0 0
Looked at search.log and I see error messages like the one below every time it says it invokes 'gentimes'. After this message appears, the time until the next cycle is almost a second as seen below.
06-16-2015 02:32:57.375 INFO DispatchThread - Error reading runtime settings: File :C:\Program Files\Splunk\var\run\splunk\dispatch\subsearch__search31434418371.2490_1434418377.5\runtime.csv does not exist
06-16-2015 02:32:57.375 INFO DispatchThread - Disk quota = 10485760000
06-16-2015 02:32:58.281 INFO script - Invoked script gentimes with 213 input bytes (0 events). Returned 2 output bytes in 891 ms.
Given the sample data, give me the desired output (I am not at all following what you are explaining).
Sorry, I hope this can help.
The following is what a segment of events looks like when the transactions have been made
start end FieldName UniqueID 1434283638.50 1434283944.90 Resource_AA USER_YY 1434284009.43 1434284172.20 Resource_AA USER_YY 1434284178.02 1434284240.24 Resource_AA USER_YY 1434353495.28 1434353616.58 Resource_BB USER_XX 1434353671.64 1434353753.25 Resource_BB USER_XX 1434353833.93 1434353872.11 Resource_BB USER_ZZ 1434353852.65 1434353868.99 Resource_AA USER_ZZ 1434353861.75 1434353884.71 Resource_BB USER_ZZ 1434353852.60 1434353931.20 Resource_AA USER_ZZ 1434353902.17 1434353915.57 Resource_BB USER_ZZ 1434353907.88 1434354047.3 Resource_AA USER_ZZ 1434354040.52 1434354077.74 Resource_BB USER_XX
This is what that series of data should look after the call to 'dedup'
start end FieldName UniqueID 06/14/2015:13:00:00 06/14/2015:13:00:00 Resource_AA USER_YY 06/14/2015:13:00:00 06/14/2015:13:15:00 Resource_AA USER_YY 06/14/2015:13:15:00 06/14/2015:13:15:00 Resource_AA USER_YY 06/15/2015:08:30:00 06/15/2015:08:30:00 Resource_BB USER_XX 06/15/2015:08:30:00 06/15/2015:08:30:00 Resource_BB USER_ZZ 06/15/2015:08:30:00 06/15/2015:08:30:00 Resource_AA USER_ZZ
This is what the end result would look like (the unique user count for each resource in a given time bucket)
_time Resource_AA Resource_BB 06/14/2015:13:00:00 1 0 06/14/2015:13:15:00 1 0 06/14/2015:08:30:00 1 2
Basically, distil the data down to show unique user count per resource per time bucket. The trick is each time bucket can have multiple complete transactions for a single resource from the same user. I can't really change that fact.
[Answer edited to correct some typos and where I used wrong field names]
First, I have not used
gentimes much, but I think this is what you want:
| map maxsearches=1000 search="gentimes start=$start_time$ end=$end_time$ increment=15m | fields starttime endtime $FieldName$ $UnqiueID$"
I have played around with this, and I think that the
mvexpand could be a problem - but I would need to see the output of your
GENERATE_TRANSACTION_TABLE to figure it out.
I would probably do it this way:
GENERATE_TRANSACTION_TABLE | fields start end FieldName UniqueID | dedup start end FieldName UniqueID | bin start span=15m | bin end span=15m | sort UniqueID FieldName start | streamstats current=f window=2 earliest(timestamp) as last_time by UniqueID FieldName | eval timestamp=if(isnotnull(last_time),relative(last_time,"+15m"),start) | rename timestamp as _time | timechart span=15m dc(UniqueID) as Count by FieldName
Finally, I don't know how you are generating your transactions, but it will almost certainly be faster if you can use the
stats command rather than the
transaction command. If you can. And the
table command (on line 2) is not accomplishing anything for you in your original search - if you are trying to optimize field extraction, use
fields instead - or just leave it out.
I appreciate your help and the suggestions.
fieldsis correct, thank you for that.
stats. Also, that is not the bottleneck in my testing, it is always
gentimesis not helping speed it up. In fact, if I trim it down to just run
gentimes start=$start$ end=$end$ increment=15m, it still takes nearly a second for each invocation (see comment above)
Can you confirm the streamstats command in your example? Are you sure you meant to use the field "timestamp" in the call to
earliest? That field does not yet exist at this point in the pipeline.
Playing around with what you provided seems to not quite get there. If you look at the original posting, I have added some sample data that should show the extent of the issue.
| streamstats current=f window=1 earliest(timestamp) as last_time by UniqueID FieldName
Try changing the window to 1
And yes, I meant to use the field
timestamp - it won't exist for the first event, but it will for subsequent events. Note that I test for a null value of
last_time in the following command.
Sorry about the typos in the original answer, I have updated it. My only excuse is that I had minor surgery the other day, so I was taking pain meds. 🙂