Solved: How to graph sum of overlapping values given start...

rbernharnavy · ‎03-21-2017

I've searched here for quite a while and didn't find what I'm looking for, or maybe I'm not wording it correctly...

I need to graph cumulative CPU core usage for multiple events given _time, duration and cpu_usage. Only "start" events are recorded.

For example:

Record A starts at 5:00pm, runs for 30 minutes, uses 10 cores.
Record B starts at 5:10pm, runs for 10 minutes, uses 100 cores.

I need to graph the hill this would result in: 0 cores at before 5:00pm, to 10 cores at 5:00pm, up to 110 cores at 5:10pm until 5:20pm, then back down to 10 cores at 5:20pm then to 0 cores after 5:20pm with no gaps between at intervals of 10 minutes.

concurrency doesn't work since it only counts the number of overlapping events and I need to take a sum against a field within overlapping events. Currently I have this query which is close but only shows instantaneous usage at the time the matching/overlapping records were created (no duration):

index=... source="..." (selection query) | dedup jobid | eval endt=_time+duration | stats min(_time) as start max(endt) as end sum(cpus) as cpus by _time | timechart span=10m sum(cpus)

DalJeanis · ‎03-23-2017

This generates test data with _time, cpus and duration...

| gentimes start="01/25/2017:23:00:00" end="01/27/2017:01:00:00" increment=23s 
| eval cpus =(random() %10+1) * pow(10,random() %2 + 1)
| eval duration =(random() %100+1) * pow(10,random() %2)
| eval _time = starttime
| streamstats count as jobid
| table _time cpus duration jobid

This splits each event into a start that adds cpus and an end that removes the same cpus.

| eval bigtime = "time="._time." cpus=".tonumber(0+cpus)."!!!!time=".tonumber(_time+duration)." cpus=".tonumber(0-cpus)."!!!!"
| rex field=bigtime max_match=2 "time=(?<time>[^ ]*) cpus=(?<cpus>.*?)!!!!"
| eval mydata=mvzip(time,cpus,"!")
| mvexpand mydata
| table mydata jobid
| rex field=mydata "(?<time>[^ ]*)!(?<cpus>.*)"
| eval cpus=tonumber(cpus)
| eval _time = time

Then this bins the time by a 1-second span, and creates the cumulative CPU stats you want.

| bin _time span=1s
| stats values(jobid) as jobid  sum(cpus) as netCPUs by _time
| streamstats sum(netCPUs) as activeCPUs

The final streamstats command can also be written using the equivalent accum...

| accum netCPUs as activeCPUs

View solution in original post

DalJeanis · ‎03-23-2017

This generates test data with _time, cpus and duration...

| gentimes start="01/25/2017:23:00:00" end="01/27/2017:01:00:00" increment=23s 
| eval cpus =(random() %10+1) * pow(10,random() %2 + 1)
| eval duration =(random() %100+1) * pow(10,random() %2)
| eval _time = starttime
| streamstats count as jobid
| table _time cpus duration jobid

This splits each event into a start that adds cpus and an end that removes the same cpus.

| eval bigtime = "time="._time." cpus=".tonumber(0+cpus)."!!!!time=".tonumber(_time+duration)." cpus=".tonumber(0-cpus)."!!!!"
| rex field=bigtime max_match=2 "time=(?<time>[^ ]*) cpus=(?<cpus>.*?)!!!!"
| eval mydata=mvzip(time,cpus,"!")
| mvexpand mydata
| table mydata jobid
| rex field=mydata "(?<time>[^ ]*)!(?<cpus>.*)"
| eval cpus=tonumber(cpus)
| eval _time = time

Then this bins the time by a 1-second span, and creates the cumulative CPU stats you want.

| bin _time span=1s
| stats values(jobid) as jobid  sum(cpus) as netCPUs by _time
| streamstats sum(netCPUs) as activeCPUs

The final streamstats command can also be written using the equivalent accum...

| accum netCPUs as activeCPUs

rbernharnavy · ‎03-23-2017

Wow! Lots of good stuff in there. Thank you for breaking it up and explaining it! Did exactly what I needed.

DalJeanis · ‎03-24-2017

You're welcome. Yes, it's a pretty useful way of looking at this kind of requirement. Forgot to say, I put the jobid in there too so you could verify on the test data which "end" went with which "start".

Notice the part (second code chunk, lines 3,4,6) where the code mvzips together multiple multivalue fields (3), immediately before the mvexpand turns them into multiple records (4), then has to break them apart again after the mvexpand (6). This is a common pattern.

It seems like a natural place where the mvexpand could be extended in some future release as | mvexpand time, cpu and do all that in one verb... but that's not available yet.

How to graph sum of overlapping values given start time and duration?

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

Update Your SOAR Apps for Python 3.13: What Community Developers Need to Know

October Community Champions: A Shoutout to Our Contributors!

Are you a member of the Splunk Community?

How to graph sum of overlapping values given start time and duration?

Observability Unlocked: Kubernetes Monitoring with Splunk Observability Cloud

Update Your SOAR Apps for Python 3.13: What Community Developers Need to Know

October Community Champions: A Shoutout to Our Contributors!