Say, I have a series of jobs involving a certain number of members,
_time MemberCount JobRunTime (min) JobName
01:00:00 100 15 Job1
01:05:00 200 30 Job2
01:15:00 300 50 Job3
01:30:00 80 10 Job4
I want to show that during the first 5 minutes, total number of members is 100, between 5 and 15 minutes, total is 300 (100+200), between 15 and 30 minutes, total is 500 (300 + 300 - 100), between 30 and 35 minutes, total is 580 (500 + 80), between 35 and 40 minutes, total is 380 (580 - 200), between 40 and 75 minutes, total is 300 (380 - 80), and so on.
So this is like the task of calculating concurrency, but instead of mere concurrency of events, I want concurrent total of a field value. The closest I have got is
[ original search] | appendpipe [
eval _time=_time + JobRunTime*60
] | timechart max(MemberCount) by JobName
This will give me an "ending" event for each job start time, so I can connect the two dots as a straight line, and see that it overlaps with some other lines on time axis. What I want is a column chart (or area chart as resolution increases) that stacks Job1's MemberCount on top of Job2's when it starts, and so on. How do I do that?
Try this
your base search | table _time JobName JobRunTime MemberCount | eval inc=mvrange(0,JobRunTime+1) | mvexpand inc | eval _time=_time + inc*60 | stats sum(MemberCount) as MemberCount by _time | timechart span=5m max(MemberCount) as ConcurrentMembers
With your sample data, output will be like this.
_time ConcurrentMemberCount
2015-01-27 01:00:00 100
2015-01-27 01:05:00 300
2015-01-27 01:10:00 300
2015-01-27 01:15:00 600
2015-01-27 01:20:00 500
2015-01-27 01:25:00 500
2015-01-27 01:30:00 580
2015-01-27 01:35:00 580
2015-01-27 01:40:00 380
2015-01-27 01:45:00 300
2015-01-27 01:50:00 300
2015-01-27 01:55:00 300
2015-01-27 02:00:00 300
2015-01-27 02:05:00 300
Try this
your base search | table _time JobName JobRunTime MemberCount | eval inc=mvrange(0,JobRunTime+1) | mvexpand inc | eval _time=_time + inc*60 | stats sum(MemberCount) as MemberCount by _time | timechart span=5m max(MemberCount) as ConcurrentMembers
With your sample data, output will be like this.
_time ConcurrentMemberCount
2015-01-27 01:00:00 100
2015-01-27 01:05:00 300
2015-01-27 01:10:00 300
2015-01-27 01:15:00 600
2015-01-27 01:20:00 500
2015-01-27 01:25:00 500
2015-01-27 01:30:00 580
2015-01-27 01:35:00 580
2015-01-27 01:40:00 380
2015-01-27 01:45:00 300
2015-01-27 01:50:00 300
2015-01-27 01:55:00 300
2015-01-27 02:00:00 300
2015-01-27 02:05:00 300
Excerllent! Because the ability to single out top contributing jobs in a big peak is what I'm after, I need to do "by JobName". But this is straightforward.
my base search | table _time JobName JobRunTime MemberCount
| eval inc=mvrange(0, JobRunTime+1) | mvexpand inc | eval _time=_time + inc*60
| stats sum(MemberCount) as MemberCount by _time JobName
| timechart span=1h max(MemberCount) as ConcurrentMembers by JobName
While I continue to digest the method and fine tune with conditions in my data and limitations of Splunk's graphics, this really eased my immediate pain. (I have been trying to do this for months if not years.)
@somesoni2: Some nuanced questions.
mvrange()
, what is the significance of JobRunTime + 1? If I run this over jobs that connect end to tail, i.e., when one job finishes the next one starts, it seems that JobRunTime - 1 would eliminate artificial overlaps. (I know that boundary conditions are prior knowledge and not a logical conclusion, so (0, JobRunTime) or (0, JobRunTime-1) will depend on use case. Just want to know if JobRunTime+1 has a special meaning.)mvrange() | mvexpand
, it appears to me that (1,range) should work as well as (0,range), because event at _time + 0
should already have existed as the original event. Does this sound right?stats
uses sum()
as output, whereas the subsequent timechart
uses max()
. Any particular consideration as to the use of sum() as opposed to using max() in both places? (Particularities in my use case not described in this question, namely that my JobName is not always unique, sum() offer some limited advantages.)Well, you need to hack it a bit further to get those timechart buckets filled correctly.
<your search here> | eval endtime=_time+JobRunTime*60 | timechart max(MemberCount) as Count max(endtime) as maxT by JobName | rename "Count: *" as * | rename "maxT: Job*" as max* | filldown | foreach Job* [eval <<FIELD>>=if(_time<max<<MATCHSTR>>, <<FIELD>>, '')] | fields - max*
Select Area chart and Format it as Stacked and Connected.alt text
This is very helpful. If I directly apply this, it only stacks count up, never takes away after JobRunTime. Why so? My data have a lot more jobs, but I have set limit=0 in timechart already.
Not sure what "only stacks count up" means. The chart above was produced by applying the search on a table as below.
_time, MemberCount, JobRunTime, JobName
1422336123, 100, 15, Job1
1422336423, 200, 30, Job2
1422337023, 300, 50, Job3
1422337923, 80, 10, Job4
I mean the graph you obtained using the sample data in the question is perfect. But I cannot get the same result on my real(ish) data; member counts simply add up and up, so the total becomes a monotonic increase. Something I haven't understood about the use of <>
and so on, I guess. Using 4 jobs of same member count, each lasting close to 4 hours consecutively, I expect a flat stripe with different colour bands. Instead, the bands just add up one after another to form a horn-like shape. (I cannot use graph.)
Well, feel free to paste more data.
When I compare what I retrieve from my sample data that give me a "horn shape" and the data posted in this question, they are not different. So it must be something I didn't set up correctly in Splunk.
_time JobName JobRunTime MemberCount
2014-08-05 10:05:00 76363659 179.75 380458
2014-08-05 06:05:00 76362122 114.75 380458
2014-08-05 02:05:00 76360937 207.75 380458
2014-08-04 22:05:00 76359961 201.75 380458
2014-08-04 18:05:00 76358947 206.75 380458
Just confirmed that it is not the dataset difference, but something wrong with my desktop/experimental Splunk 6.1.4. I run your solution over limited real data on our production server (6.1.2), and the output looks normal and plausible. No horn shape comes out. I have yet to validate output manually, but I'm now more troubled by possible setup errors in my experimental Splunk.
In a way this is similar to the empty time bucket question (http://answers.splunk.com/answers/149425/how-to-produce-empty-time-buckets.html). If, instead of filling "empty" time buckets with 0, we fill them with MemberCount of respective events, the job is done. In other words, this problem has multiple start and end times to be handled. If I can bucket time within each of these event intervals, there maybe a way to get answer.