Splunk Search

How to use the concurrency command to timechart the top 10 concurrencies by field sourceip?

jgcsco
Path Finder

I have the following event that needs to calculate concurrency:

Event, starttime=yyyy-mm-dd hh:mm:ss, duration=, sourceip=a.b.c.d

I would like to find out the concurrency of the Event based on sourceip, I have the following search:

| rex "duration=(?.*?),"  | eval StartTime=round(strptime(startTime,"%Y-%m-%d %H:%M:%S"),0) | concurrency start=StartTime duration=Duration | timechart span=1m max(concurrency) by sourceip

But for some reason, it sums up the result for all sourceip together. I'm wondering if I'm using the concurrency command correctly.

Another task is that I would like to see just the top 10 concurrency by sourceip in a timechart since there are so many sourceips, any suggestions?

Thanks,

Tags (2)
1 Solution

sideview
SplunkTrust
SplunkTrust

The concurrency command is no good when you want to split by a field like this. The reason why the timechart isn't doing the right thing is that it's too late. The concurrency command has already calculated only a global concurrency, whereas what it needs to do is calculate concurrency separately for each sourceip.

It took me a long time (years really) to find search language that could calculate concurrency by a split by field, both accurately and reasonably efficiently, and while it does exist, it's pretty advanced. I'm assuming you already have fields called StartTime and Duration and sourceip, and you want to tack this on the end:

| eval _time=StartTime
| eval increment = mvappend("1","-1") 
| mvexpand increment 
| eval _time = if(increment==1, _time, _time + Duration) 
| sort 0 + _time 
| fillnull sourceip value="NULL" 
| streamstats sum(increment) as post_concurrency by sourceip 
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart bins=400 max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by sourceip 
| filldown last_concurrency* 
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')] 
| fields - last_concurrency* max_concurrency*

If you're familiar with streamstats, the bits with the increment and the streamstats will be pretty clear - it is literally keeping a little record of each start and end, and incrementing/decrementing a separate counter for each value of sourceip.

When the timechart starts differentiating between last and max concurrency, you might wonder why. And then filldown and foreach get involved, things get pretty nutty. the core problem there is that while computing the max concurrency per timebucket is easy, we also need to preserve the last-known-concurrency value for each timebucket and sourceip, or else our concurrency math gets a little inaccurate for the following buckets. that's what that stuff is doing.

View solution in original post

popovr
Engager

As I had a similar problem - to count the parallel/concurrent HTTP requests grouping by time and host (which means the active threads in each server), I provide my solution:

 

 

index=jira-prod source="/opt/jira/logs/access_log*"
| rex field=_raw "^(?<IP>\d+\.\d+\.\d+\.\d+) (?<REQUEST_ID>[0-9]+x[0-9]+x[0-9]+) (?<USER>\S+) \[.+\] \"(?<REQUEST>[A-Z]+ \S+)-? HTTP/1.1\" (?<STATUS>[0-9]+) (?<BYTES>[0-9]+) (?<TIME>[0-9]+) \"(?<REFERER>[^\"]+)\".*$"
| eval DURATION=TIME/1000
| eval START_AT=floor(_time-DURATION)
| eval END_AT=floor(_time)
| eval IN_MOMENT=mvrange(START_AT,END_AT,1)
| mvexpand IN_MOMENT
| eval _time=strptime(""+IN_MOMENT,"%s")
| chart count as COUNT, max(DURATION) as MAX_DURATION by _time, host

 

 

This is parsing a real log file of Atlassian JIRA where:

  •  line 2 parses the JIRA access log and determines its elements, including the duration in milliseconds of the request. Note that the request is logged at the moment it is complete thus _time is the end time
  • lies 3-5 calculate the duration in seconds, start second and end second
  • line 6 fills in IN_MOMENT each of the seconds the request is active, having at least one value when the start second is equal to the end second
  • line 7 duplicates the even for each of the seconds listed in IN_MOMENT, setting the event's IN_MOMENT field to the current second as a regular single value
  • line 8 is more a hack - convert the IN_MOMENT from epoch number into a timestamp
  • line 9 calculate as whatever statistics/chart/timechart needed grouping by _time and host

This worked fine for me.

0 Karma

sideview
SplunkTrust
SplunkTrust

The concurrency command is no good when you want to split by a field like this. The reason why the timechart isn't doing the right thing is that it's too late. The concurrency command has already calculated only a global concurrency, whereas what it needs to do is calculate concurrency separately for each sourceip.

It took me a long time (years really) to find search language that could calculate concurrency by a split by field, both accurately and reasonably efficiently, and while it does exist, it's pretty advanced. I'm assuming you already have fields called StartTime and Duration and sourceip, and you want to tack this on the end:

| eval _time=StartTime
| eval increment = mvappend("1","-1") 
| mvexpand increment 
| eval _time = if(increment==1, _time, _time + Duration) 
| sort 0 + _time 
| fillnull sourceip value="NULL" 
| streamstats sum(increment) as post_concurrency by sourceip 
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart bins=400 max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by sourceip 
| filldown last_concurrency* 
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')] 
| fields - last_concurrency* max_concurrency*

If you're familiar with streamstats, the bits with the increment and the streamstats will be pretty clear - it is literally keeping a little record of each start and end, and incrementing/decrementing a separate counter for each value of sourceip.

When the timechart starts differentiating between last and max concurrency, you might wonder why. And then filldown and foreach get involved, things get pretty nutty. the core problem there is that while computing the max concurrency per timebucket is easy, we also need to preserve the last-known-concurrency value for each timebucket and sourceip, or else our concurrency math gets a little inaccurate for the following buckets. that's what that stuff is doing.

Hoekb03
Explorer

Great solution @sideview , I've been struggling with a split by concurrency problem a couple of days.  The concurrency command works fine when I just search for events with a single value used fot the split, when I search for all events with all "split by values" the numbers aren't right. I found out myself why, and thought it would be impossible. Your spooky bit of SPL solves it nicely, and running it line by line looking at the results I understand why. Thanks! 

My not correctly working code:

 

index=mfpublic sourcetype=SMF030 SMF30JNM=JOB* SMF30CLS=*
| stats earliest(_time) as start latest(_time) as stop by SMF30JNM SMF30CLS
| eval _time=start, duration=stop-start
| concurrency duration=duration
| timechart limit=20 span=15m max(concurrency) by SMF30CLS

 

And the working code with your solution:

 

index=mfpublic sourcetype=SMF030 SMF30JNM=JOB* SMF30CLS=*
| stats earliest(_time) as start latest(_time) as stop by SMF30JNM SMF30CLS
| eval _time=start, duration=stop-start
| eval increment = mvappend("1","-1") 
| mvexpand increment 
| eval _time = if(increment==1, _time, _time + duration) 
| sort 0 + _time 
| fillnull class value="NULL" 
| streamstats sum(increment) as post_concurrency by SMF30CLS
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart limit=0 span=15m max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by SMF30CLS
| filldown last_concurrency* 
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')] 
| fields - last_concurrency* max_concurrency*

 

Thanks again!

Tags (2)
0 Karma

terryrankine
Engager

This needs to become a core funtion in splunk.

The amount of time I have wasted in concurrency until I found this was way to much.

This now makes my "concurrency per host" problem disappear.

Thanks

vbumgarner
Contributor

This is also useful for determining concurrency for a single series when you don't have enough data points to be particularly useful, because you're creating a data point at the "end" of the timeframe.

0 Karma

jgcsco
Path Finder

Thanks for working out the query, looking at the result, it really gives me what I want now. However, I am not sure I understand why use "bin=400", and "foreach" statement.

0 Karma

sideview
SplunkTrust
SplunkTrust

the bin is actually more of a personal preference. I usually find I want a bit more granularity in my timecharts. feel free to adjust that up or down. The number specifies a ceiling for the number of timebuckets in the displayed timerange and timechart will come as close to that ceiling as it can. Default is 100 and these are too chunky for my concurrency use cases.

the purpose of the foreach, yes, it's tied in with the very strange output from timechart (delete back so the timechart is the last clause and run it, then view as a table and you'll see what I mean). The filldown does strange things to smear the last_concurrency... values through the earlier buckets, and the foreach is doing some peculiar logic to put it all back together again and reconcile each sourceip's value of last concurrency and max concurrency so each bucket gets the right max concurrency value.

jgcsco
Path Finder

Thanks sideview for the detailed response. Besides the above result, I also want to take a look at the concurrency distribution to find out if it is matching a z-distribution. I modify the query to the following:

| eval StartTime=round(strptime(startTime,"%Y-%m-%dT%H:%M:%SZ"),0) 
|eval _time=StartTime 
| eval increment = mvappend("1","-1") 
| mvexpand increment 
| eval _time = if(increment==1, _time, _time + Duration) 
| sort 0 + _time 
| fillnull sourceId value="NULL" 
| streamstats sum(increment) as post_concurrency by sourceip 
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency) 
| stats count(concurrency) by concurrency   

It seems giving me the data I want, but would like to get your opinion. I am not sure of the granularity of concurrency here, was it count it by second? Both StartTime and Duration are down to second.

Is there a way to make concurrency as x-Axis, and count(concurrency) as y-Axis?

Thanks,

0 Karma

sideview
SplunkTrust
SplunkTrust

I see what you're trying to do, but I'm not sure | stats count(concurrency) by concurrency is the way to go about it. That clause will just count the number of events that at some point have each individual integer value of concurrency. It's close to a working definition of a frequency distribution but I'm not sure it's what you want. I'd want a frequency distribution of concurrency to have rows that are timebuckets of equal length.

And it begs the question - do you want to see the frequency distribution of overall concurrency, or of concurrency as split by the values of sourceIp?

To be honest I think this is a separate question. The answer and comment thread for this sub-question might get too complex. We can just link the two questions together and it'll be more usable that way.

0 Karma

jgcsco
Path Finder

I would like to see the frequency how of the same concurrency number happened, if two different sourceip show the same concurrency at certain point, then I would like to count it as two. This is more like a statistic analysis for our traffic pattern.

0 Karma

sideview
SplunkTrust
SplunkTrust

Understand completely and I still think it's a separate question. Also still think that an approach like "stats count by concurrency " is statistically suspect and going to skew your results significantly, not because the's anything wrong with that command, but the rows going into it do not fit the right asssumptions at all.

As a thumbnail sketch of what will, onto the very end of all the search language I listed, you want to use the untable command like untable _time sourceIp count to unwind the output down to rows that are each distinct combinations of time, sourceip, and that for each such combination has a concurrency number. That's the set of rows you want to do this kind of analysis on. From there yes you can do stats count by concurrency or if you want to bucket it, | bucket concurrency span=5 | stats count by concurrency | sort 0 - concurrency

0 Karma

jgcsco
Path Finder

sorry, just got back from a trip. Agree with you, this should be a separate question. I have created a new one at:

http://answers.splunk.com/answers/230633/how-to-calculate-concurrency-distribution.html

Can you please link it? Since you know the history of the question, and have helped a lot.

Thanks, and sorry for making complicating the question.

0 Karma

chimell
Motivator

Hi jgcsco
To see just the top 10 concurrency by sourceip add limit=10 in your search like below

    | rex "duration=(?.*?),"  | eval StartTime=round(strptime(startTime,"%Y-%m-%d %H:%M:%S"),0) | concurrency start=StartTime duration=Duration | timechart  limit=10 span=1m max(concurrency) by sourceip
0 Karma

jgcsco
Path Finder

well, the splunk is still not grouping the concurrency result by sourceip correctly, but rather a sum of them. Not sure if it is due to the large dataset. The souceip is around 1000 entries. The Duration is anywhere from 30min to 3hrs. And at peak time, the total event count could reach 40K per minute.

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...