Splunk Search
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

I am sorry but the streamstats command you provided does not produce a timestamp field when it doesn't exist. Just tested it to confirm.

Separately, I don't quite understand what you are trying to accomplish with that command. Please clarify. If it helps, look at the sample data I posted in the original question to demonstrate your intention.

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Path Finder

This worked for me in diagnosing concurrent API calls from customers.

I consider myself reasonably good at Splunk, but wow.

(bows in the presence of a Splunk master)

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Legend

As an alternative, you might look at this question, which has a couple of suggestions for how to build a gantt chart in Splunk

Building a gantt chart

and this free app

Gantt chart visualization

I haven't used either of these...

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

Thank you, I have seen those before.

The need I am addressing here is not to build a gantt chart. l could do that with some JS with my original data. Instead, I need to count concurrent users across the specific field value in the data.

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

This is the old, non-scalable answer for my very specific use case. Look at accepted answer for more useful, generic answer.

My solution in the comments of the original post did work, but it was extremely inefficient because gentimes is invoked so many times. Instead of massaging gentimes and map to work in an odd manner, I just wrote my own command.

Below is my python code for my new command that did what I wanted to do.

Assume the command is called tranexpand. It is invoked with fields span and fields where span is the size of the time buckets for the events parsed by the command and fields is a comma-separated list of fields that need to be maintained in the output data. An example invocation looks like tranexpand span=30m fields="field1,field2". The command assumes the events being passed to it include fields starttime and endtime. These fields indicate the start and end time bucket for the event. For example, if a transaction/event ended at 8:15:12am and ended at 8:37:35am and the bucket size is 30 minutes, the starttime field would be the equivalent of 8:00:00am and the endtime field would be the equivalent of 8:30:00am.

import re
import splunk.Intersplunk

def getSpan(val):
    if not val:
        return None
    match = re.findall("(\d+)([smhd])", val)

    if len(match) > 0:
        val = int(match[0][0])
        units = match[0][1]
        # don't do anything for units == 's', val doesn't need to change
        if units == 'm':
            val *= 60
        elif units == 'h':
            val *= 3600
        elif units == 'd':
            val *= (24 * 3600)
        return val
    return None


def generateNewEvents(results, settings):
    try:
        keywords, argvals = splunk.Intersplunk.getKeywordsAndOptions()
        spanstr = argvals.get("span", None)
        span = getSpan(spanstr)
        fields = argvals.get("fields", None)

        if not span:
            return splunk.Intersplunk.generateErrorResults(
                "generateNewEvents requires span=val[s|m|h|d]")

        if not fields:
            return splunk.Intersplunk.generateErrorResults(
                "generateNewEvents requires comma separated" +
                " field list wrapped in quotes: fields=\"A[,B[...]]\"")

        fields = fields.split(',')

        new_results = []

        # for each result, add fields set to message
        for r in results:
            start = r.get("starttime", None)
            end = r.get("lasttime", None)

            if (start is not None) and (end is not None):
                try:
                    start = int(float(start))
                    end = int(float(end)) + 1

                    for x in range(start, end, span):
                        new_event = {}
                        new_event['_time'] = str(x)
                        for y in fields:
                            new_event[y] = r.get(y, None)
                        new_results.append(new_event)
                except:
                    pass

        results = new_results

    except Exception, e:
        import traceback
        stack = traceback.format_exc()
        results = splunk.Intersplunk.generateErrorResults(
            str(e) + ". Traceback: " + str(stack))

    return results

results, dummyresults, settings = splunk.Intersplunk.getOrganizedResults()
results = generateNewEvents(results, settings)
splunk.Intersplunk.outputResults(results)
0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

SplunkTrust
SplunkTrust

1) First, it's worth saying that if it weren't for the AAA vs BBB thing, this would be a very straightforward use case for the concurrency command.

eval duration=End-Start | concurrency start=Start duration=duration | timechart max(concurrency)

2) But you need a split by. The need to "split by" while calculating the concurrency, makes it complicated. Fortunately this is fairly well trod, if extremely advanced territory. And this approach will be much faster than the kind of approach that needs things like map and gentimes and makecontinuous.

eval duration=End-Start 
| eval increment = mvappend("1","-1") 
| mvexpand increment 
| eval _time = if(increment==1, _time, _time + duration) 
| sort 0 + _time 
| fillnull Field_Name value="NULL" 
| streamstats sum(increment) as post_concurrency by Field_Name 
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency) 
| timechart bins=400 max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by Field_Name limit=30 
| filldown last_concurrency* 
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')] 
| fields - last_concurrency* max_concurrency*

3) But you have a third wrinkle I think, that if the same user shows up for the same resource more than once in a given time period, you don't want to double count them? Assuming I'm right you need to be a little more careful how the counting gets done. I've taken a crack at that here, with an extra streamstats and an extra eval trying to cut away the double counting.

eval duration=End-Start 
| eval increment = mvappend("1","-1") 
| mvexpand increment 
| eval _time = if(increment==1, _time, _time + duration) 
| sort 0 + _time 
| fillnull Field_Name value="NULL" 
| streamstats sum(increment) as post_concurrency_per_user by Field_Name Unique_Visitor_ID 
| eval post_concurrency=if(post_concurrency_per_user>1,1,0) 
| streamstats sum(post_concurrency) as post_concurrency by Field_Name 
|eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency) 
| timechart bins=400 max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by Field_Name limit=30 
| filldown last_concurrency* 
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')] 
| fields - last_concurrency* max_concurrency*
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

Thanks @sideview.

This is the newer path I took a while ago in our app, but it was even more complicated than what you came up with scenario 3). If you are interested, you can go check out some of the macros in our app (Layer8Insight App for Splunk, splunkbase.splunk.com/app/3171

We pre-calculate the time bins down to the minute during which a user touches the resources, use mvexpand to expand that set of time bins into events, than run the various aggregations over those events.

I would say come to my talk at .conf, but it got rejected. Was trying to present on a generic approach to handle durational events/data instead of the discrete/sampled data most people are used to in Splunk. Unfortunately, that topic wasn't accepted. 😞

0 Karma
Highlighted

Re: Concurrent users per time bucket from transactions

Champion

This is the more generic approach that is used heavily in one of my apps. It will split the time ranges marked by Start and End and convert them into 1-minute (60 second) time buckets with the duration2 field indicating how much time in the time bucket the event occupied. Assume Start and End are time as Epoch milliseconds.

Note: @sideview was very close and much more detailed in terms of options for simpler cases (why I upvoted him). My case was more complicated and required the most generic approach I could come up with, i.e., transaction and concurrency couldn't handle it.

 BASE SEARCH ... 
 | eval earliest=Start
 | eval latest=End
 | eval duration = latest - earliest
 | eval start_time_min = floor(earliest - (earliest % 60))
 | eval end_time_min   = floor(latest - (latest % 60))
 | eval time_bins = mvrange(start_time_min, end_time_min + 1, 60)
 | mvexpand time_bins 
 | eval duration2 = if(start_time_min == end_time_min, duration, if(start_time_min == time_bins, round(start_time_min + 60 - earliest, 3), if(end_time_min == time_bins, round(latest - end_time_min, 3), 60))) 
 | rename time_bins as _time
 | table _time duration2 Field_Name Unique_Visitor_ID
 | eval _span = 60
 | ... do stats or whatever you need

View solution in original post