Splunk Search

How to graph sum of overlapping values given start time and duration?

Engager

I've searched here for quite a while and didn't find what I'm looking for, or maybe I'm not wording it correctly...

I need to graph cumulative CPU core usage for multiple events given time, duration and cpuusage. Only "start" events are recorded.

For example:

Record A starts at 5:00pm, runs for 30 minutes, uses 10 cores.
Record B starts at 5:10pm, runs for 10 minutes, uses 100 cores.

I need to graph the hill this would result in: 0 cores at before 5:00pm, to 10 cores at 5:00pm, up to 110 cores at 5:10pm until 5:20pm, then back down to 10 cores at 5:20pm then to 0 cores after 5:20pm with no gaps between at intervals of 10 minutes.

concurrency doesn't work since it only counts the number of overlapping events and I need to take a sum against a field within overlapping events. Currently I have this query which is close but only shows instantaneous usage at the time the matching/overlapping records were created (no duration):

index=... source="..." (selection query) | dedup jobid | eval endt=_time+duration | stats min(_time) as start max(endt) as end sum(cpus) as cpus by _time | timechart span=10m sum(cpus)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

This generates test data with _time, cpus and duration...

| gentimes start="01/25/2017:23:00:00" end="01/27/2017:01:00:00" increment=23s 
| eval cpus =(random() %10+1) * pow(10,random() %2 + 1)
| eval duration =(random() %100+1) * pow(10,random() %2)
| eval _time = starttime
| streamstats count as jobid
| table _time cpus duration jobid

This splits each event into a start that adds cpus and an end that removes the same cpus.

| eval bigtime = "time="._time." cpus=".tonumber(0+cpus)."!!!!time=".tonumber(_time+duration)." cpus=".tonumber(0-cpus)."!!!!"
| rex field=bigtime max_match=2 "time=(?<time>[^ ]*) cpus=(?<cpus>.*?)!!!!"
| eval mydata=mvzip(time,cpus,"!")
| mvexpand mydata
| table mydata jobid
| rex field=mydata "(?<time>[^ ]*)!(?<cpus>.*)"
| eval cpus=tonumber(cpus)
| eval _time = time

Then this bins the time by a 1-second span, and creates the cumulative CPU stats you want.

| bin _time span=1s
| stats values(jobid) as jobid  sum(cpus) as netCPUs by _time
| streamstats sum(netCPUs) as activeCPUs

The final streamstats command can also be written using the equivalent accum...

| accum netCPUs as activeCPUs

View solution in original post

SplunkTrust
SplunkTrust

This generates test data with _time, cpus and duration...

| gentimes start="01/25/2017:23:00:00" end="01/27/2017:01:00:00" increment=23s 
| eval cpus =(random() %10+1) * pow(10,random() %2 + 1)
| eval duration =(random() %100+1) * pow(10,random() %2)
| eval _time = starttime
| streamstats count as jobid
| table _time cpus duration jobid

This splits each event into a start that adds cpus and an end that removes the same cpus.

| eval bigtime = "time="._time." cpus=".tonumber(0+cpus)."!!!!time=".tonumber(_time+duration)." cpus=".tonumber(0-cpus)."!!!!"
| rex field=bigtime max_match=2 "time=(?<time>[^ ]*) cpus=(?<cpus>.*?)!!!!"
| eval mydata=mvzip(time,cpus,"!")
| mvexpand mydata
| table mydata jobid
| rex field=mydata "(?<time>[^ ]*)!(?<cpus>.*)"
| eval cpus=tonumber(cpus)
| eval _time = time

Then this bins the time by a 1-second span, and creates the cumulative CPU stats you want.

| bin _time span=1s
| stats values(jobid) as jobid  sum(cpus) as netCPUs by _time
| streamstats sum(netCPUs) as activeCPUs

The final streamstats command can also be written using the equivalent accum...

| accum netCPUs as activeCPUs

View solution in original post

Engager

Wow! Lots of good stuff in there. Thank you for breaking it up and explaining it! Did exactly what I needed.

0 Karma

SplunkTrust
SplunkTrust

You're welcome. Yes, it's a pretty useful way of looking at this kind of requirement. Forgot to say, I put the jobid in there too so you could verify on the test data which "end" went with which "start".

Notice the part (second code chunk, lines 3,4,6) where the code mvzips together multiple multivalue fields (3), immediately before the mvexpand turns them into multiple records (4), then has to break them apart again after the mvexpand (6). This is a common pattern.

It seems like a natural place where the mvexpand could be extended in some future release as | mvexpand time, cpu and do all that in one verb... but that's not available yet.

0 Karma