Splunk Search

How to compute concurrent members in events?

yuanliu
SplunkTrust
SplunkTrust

Say, I have a series of jobs involving a certain number of members,

_time        MemberCount  JobRunTime (min) JobName
01:00:00           100               15     Job1
01:05:00           200               30     Job2
01:15:00           300               50     Job3
01:30:00            80               10     Job4

I want to show that during the first 5 minutes, total number of members is 100, between 5 and 15 minutes, total is 300 (100+200), between 15 and 30 minutes, total is 500 (300 + 300 - 100), between 30 and 35 minutes, total is 580 (500 + 80), between 35 and 40 minutes, total is 380 (580 - 200), between 40 and 75 minutes, total is 300 (380 - 80), and so on.

So this is like the task of calculating concurrency, but instead of mere concurrency of events, I want concurrent total of a field value. The closest I have got is

 [ original search] | appendpipe [
  eval _time=_time + JobRunTime*60
  ] | timechart max(MemberCount) by JobName

This will give me an "ending" event for each job start time, so I can connect the two dots as a straight line, and see that it overlaps with some other lines on time axis. What I want is a column chart (or area chart as resolution increases) that stacks Job1's MemberCount on top of Job2's when it starts, and so on. How do I do that?

0 Karma
1 Solution

somesoni2
Revered Legend

Try this

your base search | table _time JobName JobRunTime MemberCount | eval inc=mvrange(0,JobRunTime+1) | mvexpand inc | eval _time=_time + inc*60 | stats sum(MemberCount) as MemberCount by _time | timechart span=5m max(MemberCount) as ConcurrentMembers

With your sample data, output will be like this.

_time   ConcurrentMemberCount
2015-01-27 01:00:00     100
2015-01-27 01:05:00     300
2015-01-27 01:10:00     300
2015-01-27 01:15:00     600
2015-01-27 01:20:00     500
2015-01-27 01:25:00     500
2015-01-27 01:30:00     580
2015-01-27 01:35:00     580
2015-01-27 01:40:00     380
2015-01-27 01:45:00     300
2015-01-27 01:50:00     300
2015-01-27 01:55:00     300
2015-01-27 02:00:00     300
2015-01-27 02:05:00     300 

View solution in original post

somesoni2
Revered Legend

Try this

your base search | table _time JobName JobRunTime MemberCount | eval inc=mvrange(0,JobRunTime+1) | mvexpand inc | eval _time=_time + inc*60 | stats sum(MemberCount) as MemberCount by _time | timechart span=5m max(MemberCount) as ConcurrentMembers

With your sample data, output will be like this.

_time   ConcurrentMemberCount
2015-01-27 01:00:00     100
2015-01-27 01:05:00     300
2015-01-27 01:10:00     300
2015-01-27 01:15:00     600
2015-01-27 01:20:00     500
2015-01-27 01:25:00     500
2015-01-27 01:30:00     580
2015-01-27 01:35:00     580
2015-01-27 01:40:00     380
2015-01-27 01:45:00     300
2015-01-27 01:50:00     300
2015-01-27 01:55:00     300
2015-01-27 02:00:00     300
2015-01-27 02:05:00     300 

yuanliu
SplunkTrust
SplunkTrust

Excerllent! Because the ability to single out top contributing jobs in a big peak is what I'm after, I need to do "by JobName". But this is straightforward.

my base search | table _time JobName JobRunTime MemberCount
 | eval inc=mvrange(0, JobRunTime+1) | mvexpand inc | eval _time=_time + inc*60
 | stats sum(MemberCount) as MemberCount by _time JobName
 | timechart span=1h max(MemberCount) as ConcurrentMembers by JobName

While I continue to digest the method and fine tune with conditions in my data and limitations of Splunk's graphics, this really eased my immediate pain. (I have been trying to do this for months if not years.)

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@somesoni2: Some nuanced questions.

  • In mvrange(), what is the significance of JobRunTime + 1? If I run this over jobs that connect end to tail, i.e., when one job finishes the next one starts, it seems that JobRunTime - 1 would eliminate artificial overlaps. (I know that boundary conditions are prior knowledge and not a logical conclusion, so (0, JobRunTime) or (0, JobRunTime-1) will depend on use case. Just want to know if JobRunTime+1 has a special meaning.)
  • To follow on mvrange() | mvexpand, it appears to me that (1,range) should work as well as (0,range), because event at _time + 0 should already have existed as the original event. Does this sound right?
  • How does mvrange determine increment when none is given? Direct test using my data seems to show that it takes 1 minute increment.
  • The first stats uses sum() as output, whereas the subsequent timechart uses max(). Any particular consideration as to the use of sum() as opposed to using max() in both places? (Particularities in my use case not described in this question, namely that my JobName is not always unique, sum() offer some limited advantages.)
0 Karma

_d_
Splunk Employee
Splunk Employee

alt textWell, you need to hack it a bit further to get those timechart buckets filled correctly.

<your search here> | eval endtime=_time+JobRunTime*60  | timechart max(MemberCount) as Count max(endtime) as maxT by JobName | rename "Count: *" as *  | rename "maxT: Job*" as max* | filldown | foreach Job* [eval <<FIELD>>=if(_time<max<<MATCHSTR>>, <<FIELD>>, '')] | fields - max*

Select Area chart and Format it as Stacked and Connected.alt text

0 Karma

yuanliu
SplunkTrust
SplunkTrust

This is very helpful. If I directly apply this, it only stacks count up, never takes away after JobRunTime. Why so? My data have a lot more jobs, but I have set limit=0 in timechart already.

0 Karma

_d_
Splunk Employee
Splunk Employee

Not sure what "only stacks count up" means. The chart above was produced by applying the search on a table as below.

 _time, MemberCount, JobRunTime, JobName
1422336123, 100, 15, Job1
1422336423, 200, 30, Job2
1422337023, 300, 50, Job3
1422337923, 80, 10, Job4
0 Karma

yuanliu
SplunkTrust
SplunkTrust

I mean the graph you obtained using the sample data in the question is perfect. But I cannot get the same result on my real(ish) data; member counts simply add up and up, so the total becomes a monotonic increase. Something I haven't understood about the use of <> and so on, I guess. Using 4 jobs of same member count, each lasting close to 4 hours consecutively, I expect a flat stripe with different colour bands. Instead, the bands just add up one after another to form a horn-like shape. (I cannot use graph.)

0 Karma

_d_
Splunk Employee
Splunk Employee

Well, feel free to paste more data.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

When I compare what I retrieve from my sample data that give me a "horn shape" and the data posted in this question, they are not different. So it must be something I didn't set up correctly in Splunk.

_time   JobName JobRunTime  MemberCount
2014-08-05 10:05:00 76363659    179.75  380458
2014-08-05 06:05:00 76362122    114.75  380458
2014-08-05 02:05:00 76360937    207.75  380458
2014-08-04 22:05:00 76359961    201.75  380458
2014-08-04 18:05:00 76358947    206.75  380458
0 Karma

yuanliu
SplunkTrust
SplunkTrust

Just confirmed that it is not the dataset difference, but something wrong with my desktop/experimental Splunk 6.1.4. I run your solution over limited real data on our production server (6.1.2), and the output looks normal and plausible. No horn shape comes out. I have yet to validate output manually, but I'm now more troubled by possible setup errors in my experimental Splunk.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

In a way this is similar to the empty time bucket question (http://answers.splunk.com/answers/149425/how-to-produce-empty-time-buckets.html). If, instead of filling "empty" time buckets with 0, we fill them with MemberCount of respective events, the job is done. In other words, this problem has multiple start and end times to be handled. If I can bucket time within each of these event intervals, there maybe a way to get answer.

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...