I have the following event that needs to calculate concurrency:
Event, starttime=yyyy-mm-dd hh:mm:ss, duration=, sourceip=a.b.c.d
I would like to find out the concurrency of the Event based on sourceip
, I have the following search:
| rex "duration=(?.*?)," | eval StartTime=round(strptime(startTime,"%Y-%m-%d %H:%M:%S"),0) | concurrency start=StartTime duration=Duration | timechart span=1m max(concurrency) by sourceip
But for some reason, it sums up the result for all sourceip
together. I'm wondering if I'm using the concurrency command correctly.
Another task is that I would like to see just the top 10 concurrency by sourceip
in a timechart since there are so many sourceips, any suggestions?
Thanks,
The concurrency command is no good when you want to split by a field like this. The reason why the timechart isn't doing the right thing is that it's too late. The concurrency command has already calculated only a global concurrency, whereas what it needs to do is calculate concurrency separately for each sourceip.
It took me a long time (years really) to find search language that could calculate concurrency by a split by field, both accurately and reasonably efficiently, and while it does exist, it's pretty advanced. I'm assuming you already have fields called StartTime
and Duration
and sourceip
, and you want to tack this on the end:
| eval _time=StartTime
| eval increment = mvappend("1","-1")
| mvexpand increment
| eval _time = if(increment==1, _time, _time + Duration)
| sort 0 + _time
| fillnull sourceip value="NULL"
| streamstats sum(increment) as post_concurrency by sourceip
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart bins=400 max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by sourceip
| filldown last_concurrency*
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')]
| fields - last_concurrency* max_concurrency*
If you're familiar with streamstats, the bits with the increment and the streamstats will be pretty clear - it is literally keeping a little record of each start and end, and incrementing/decrementing a separate counter for each value of sourceip.
When the timechart starts differentiating between last and max concurrency, you might wonder why. And then filldown and foreach get involved, things get pretty nutty. the core problem there is that while computing the max concurrency per timebucket is easy, we also need to preserve the last-known-concurrency value for each timebucket and sourceip, or else our concurrency math gets a little inaccurate for the following buckets. that's what that stuff is doing.
As I had a similar problem - to count the parallel/concurrent HTTP requests grouping by time and host (which means the active threads in each server), I provide my solution:
index=jira-prod source="/opt/jira/logs/access_log*"
| rex field=_raw "^(?<IP>\d+\.\d+\.\d+\.\d+) (?<REQUEST_ID>[0-9]+x[0-9]+x[0-9]+) (?<USER>\S+) \[.+\] \"(?<REQUEST>[A-Z]+ \S+)-? HTTP/1.1\" (?<STATUS>[0-9]+) (?<BYTES>[0-9]+) (?<TIME>[0-9]+) \"(?<REFERER>[^\"]+)\".*$"
| eval DURATION=TIME/1000
| eval START_AT=floor(_time-DURATION)
| eval END_AT=floor(_time)
| eval IN_MOMENT=mvrange(START_AT,END_AT,1)
| mvexpand IN_MOMENT
| eval _time=strptime(""+IN_MOMENT,"%s")
| chart count as COUNT, max(DURATION) as MAX_DURATION by _time, host
This is parsing a real log file of Atlassian JIRA where:
This worked fine for me.
The concurrency command is no good when you want to split by a field like this. The reason why the timechart isn't doing the right thing is that it's too late. The concurrency command has already calculated only a global concurrency, whereas what it needs to do is calculate concurrency separately for each sourceip.
It took me a long time (years really) to find search language that could calculate concurrency by a split by field, both accurately and reasonably efficiently, and while it does exist, it's pretty advanced. I'm assuming you already have fields called StartTime
and Duration
and sourceip
, and you want to tack this on the end:
| eval _time=StartTime
| eval increment = mvappend("1","-1")
| mvexpand increment
| eval _time = if(increment==1, _time, _time + Duration)
| sort 0 + _time
| fillnull sourceip value="NULL"
| streamstats sum(increment) as post_concurrency by sourceip
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart bins=400 max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by sourceip
| filldown last_concurrency*
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')]
| fields - last_concurrency* max_concurrency*
If you're familiar with streamstats, the bits with the increment and the streamstats will be pretty clear - it is literally keeping a little record of each start and end, and incrementing/decrementing a separate counter for each value of sourceip.
When the timechart starts differentiating between last and max concurrency, you might wonder why. And then filldown and foreach get involved, things get pretty nutty. the core problem there is that while computing the max concurrency per timebucket is easy, we also need to preserve the last-known-concurrency value for each timebucket and sourceip, or else our concurrency math gets a little inaccurate for the following buckets. that's what that stuff is doing.
Great solution @sideview , I've been struggling with a split by concurrency problem a couple of days. The concurrency command works fine when I just search for events with a single value used fot the split, when I search for all events with all "split by values" the numbers aren't right. I found out myself why, and thought it would be impossible. Your spooky bit of SPL solves it nicely, and running it line by line looking at the results I understand why. Thanks!
My not correctly working code:
index=mfpublic sourcetype=SMF030 SMF30JNM=JOB* SMF30CLS=*
| stats earliest(_time) as start latest(_time) as stop by SMF30JNM SMF30CLS
| eval _time=start, duration=stop-start
| concurrency duration=duration
| timechart limit=20 span=15m max(concurrency) by SMF30CLS
And the working code with your solution:
index=mfpublic sourcetype=SMF030 SMF30JNM=JOB* SMF30CLS=*
| stats earliest(_time) as start latest(_time) as stop by SMF30JNM SMF30CLS
| eval _time=start, duration=stop-start
| eval increment = mvappend("1","-1")
| mvexpand increment
| eval _time = if(increment==1, _time, _time + duration)
| sort 0 + _time
| fillnull class value="NULL"
| streamstats sum(increment) as post_concurrency by SMF30CLS
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| timechart limit=0 span=15m max(concurrency) as max_concurrency last(post_concurrency) as last_concurrency by SMF30CLS
| filldown last_concurrency*
| foreach "max_concurrency: *" [eval <<MATCHSTR>>=coalesce('max_concurrency: <<MATCHSTR>>','last_concurrency: <<MATCHSTR>>')]
| fields - last_concurrency* max_concurrency*
Thanks again!
This needs to become a core funtion in splunk.
The amount of time I have wasted in concurrency until I found this was way to much.
This now makes my "concurrency per host" problem disappear.
Thanks
This is also useful for determining concurrency for a single series when you don't have enough data points to be particularly useful, because you're creating a data point at the "end" of the timeframe.
Thanks for working out the query, looking at the result, it really gives me what I want now. However, I am not sure I understand why use "bin=400", and "foreach" statement.
the bin
is actually more of a personal preference. I usually find I want a bit more granularity in my timecharts. feel free to adjust that up or down. The number specifies a ceiling for the number of timebuckets in the displayed timerange and timechart will come as close to that ceiling as it can. Default is 100 and these are too chunky for my concurrency use cases.
the purpose of the foreach, yes, it's tied in with the very strange output from timechart (delete back so the timechart is the last clause and run it, then view as a table and you'll see what I mean). The filldown does strange things to smear the last_concurrency... values through the earlier buckets, and the foreach is doing some peculiar logic to put it all back together again and reconcile each sourceip's value of last concurrency and max concurrency so each bucket gets the right max concurrency value.
Thanks sideview for the detailed response. Besides the above result, I also want to take a look at the concurrency distribution to find out if it is matching a z-distribution. I modify the query to the following:
| eval StartTime=round(strptime(startTime,"%Y-%m-%dT%H:%M:%SZ"),0)
|eval _time=StartTime
| eval increment = mvappend("1","-1")
| mvexpand increment
| eval _time = if(increment==1, _time, _time + Duration)
| sort 0 + _time
| fillnull sourceId value="NULL"
| streamstats sum(increment) as post_concurrency by sourceip
| eval concurrency = if(increment==-1, post_concurrency+1, post_concurrency)
| stats count(concurrency) by concurrency
It seems giving me the data I want, but would like to get your opinion. I am not sure of the granularity of concurrency here, was it count it by second? Both StartTime and Duration are down to second.
Is there a way to make concurrency as x-Axis, and count(concurrency) as y-Axis?
Thanks,
I see what you're trying to do, but I'm not sure | stats count(concurrency) by concurrency
is the way to go about it. That clause will just count the number of events that at some point have each individual integer value of concurrency. It's close to a working definition of a frequency distribution but I'm not sure it's what you want. I'd want a frequency distribution of concurrency to have rows that are timebuckets of equal length.
And it begs the question - do you want to see the frequency distribution of overall concurrency, or of concurrency as split by the values of sourceIp?
To be honest I think this is a separate question. The answer and comment thread for this sub-question might get too complex. We can just link the two questions together and it'll be more usable that way.
I would like to see the frequency how of the same concurrency number happened, if two different sourceip show the same concurrency at certain point, then I would like to count it as two. This is more like a statistic analysis for our traffic pattern.
Understand completely and I still think it's a separate question. Also still think that an approach like "stats count by concurrency " is statistically suspect and going to skew your results significantly, not because the's anything wrong with that command, but the rows going into it do not fit the right asssumptions at all.
As a thumbnail sketch of what will, onto the very end of all the search language I listed, you want to use the untable command like untable _time sourceIp count
to unwind the output down to rows that are each distinct combinations of time, sourceip, and that for each such combination has a concurrency number. That's the set of rows you want to do this kind of analysis on. From there yes you can do stats count by concurrency
or if you want to bucket it, | bucket concurrency span=5 | stats count by concurrency | sort 0 - concurrency
sorry, just got back from a trip. Agree with you, this should be a separate question. I have created a new one at:
http://answers.splunk.com/answers/230633/how-to-calculate-concurrency-distribution.html
Can you please link it? Since you know the history of the question, and have helped a lot.
Thanks, and sorry for making complicating the question.
Hi jgcsco
To see just the top 10 concurrency by sourceip add limit=10 in your search like below
| rex "duration=(?.*?)," | eval StartTime=round(strptime(startTime,"%Y-%m-%d %H:%M:%S"),0) | concurrency start=StartTime duration=Duration | timechart limit=10 span=1m max(concurrency) by sourceip
well, the splunk is still not grouping the concurrency result by sourceip correctly, but rather a sum of them. Not sure if it is due to the large dataset. The souceip is around 1000 entries. The Duration is anywhere from 30min to 3hrs. And at peak time, the total event count could reach 40K per minute.