Splunk Search

How to edit my search to calculate values based on deltas of ps fields, grouped by PID?

ksh93
Explorer

Hello!
I'm trying to calculate values based on deltas of ps fields, grouped by PID - ie, I want to refer to the previous timestamp but for the same PID. Obviously there are many PIDs listed for each timestamp.
Specifically in pseudocode I am calculating CPUAVG = 100* ( this(CPUTIME) - last(CPUTIME) ) / ( this(ELAPSED) - last(ELAPSED) )
Surprisingly I can't seem to find a similar situation searching the questions/answers (or perhaps, as a Splunk newbe I am not understanding the answers).
I had some success with bucket/stat (however with odd combinations of events occuring) but I think this should be a streamstats use case .. but am I not understanding how the "by" clause works?

host=myhost index=os source=ps "java" 
| streamstats current=false window=1 first(CPUTIME) as last_CPUTIME first(ELAPSED) as last_ELAPSED by PID 
| eval CPUSINCE=strptime(CPUTIME,"%H:%M:%S")-strptime(last_CPUTIME,"%H:%M:%S") 
| eval ELAPSEDSINCE=strptime(ELAPSED,"%H:%M:%S")-strptime(last_ELAPSED,"%H:%M:%S") 
| eval CPUAVG=100*CPUSINCE/ELAPSEDSINCE 
| table _time PID CPUTIME last_CPUTIME CPUSINCE ELAPSED last_ELAPSED ELAPSEDSINCE CPUAVG

... the calculated and last_ vailes are all NULL. What am I doing wrong?
Any guidance appreciated 🙂

0 Karma
1 Solution

ksh93
Explorer

I think I have worked it out: setting the streamstats window=2 and global=false is half way there, but the results seem to calculate for 0, 1 and 2 rows, which is really very counter-intuitive.
So, by adding a count() of the number of rows included in the stats calc, I can use a subsequent where clause to limit it to ONLY the rows with EXACTLY two events included in the calculation:


host=svhwm0000006pr index=os source=ps
| eval host_dot_PID=host + "." + PID
| streamstats window=2 current=true global=false count(_time) as rowcount earliest(CPUTIME) as startCPUTIME latest(CPUTIME) as endCPUTIME earliest(ELAPSED) as startELAPSED latest(ELAPSED) as endELAPSED by host_dot_PID
| where rowcount=2
| eval CPUSINCE=coalesce(strptime(endCPUTIME,"%H:%M:%S"),strptime(endCPUTIME,"%d-%H:%M:%S"))-coalesce(strptime(startCPUTIME,"%H:%M:%S"),strptime(startCPUTIME,"%d-%H:%M:%S"),0)
| eval ELAPSEDSINCE=coalesce(strptime(endELAPSED,"%H:%M:%S"),strptime(endELAPSED,"%d-%H:%M:%S"))-coalesce(strptime(startELAPSED,"%H:%M:%S"),strptime(startELAPSED,"%d-%H:%M:%S"),0)
| eval CPUAVG=100*CPUSINCE/ELAPSEDSINCE
| timechart span=5mins limit=0 avg(CPUAVG) by host_dot_PID

FYI the ps CPUTIME and ELAPSED fields have an optional %d- on the front (number of days ie over 24 hours) so to cater for these I have the ugly coalesce() evals ... let me know if you have a neater way of handling optional values for strptime().

View solution in original post

0 Karma

DalJeanis
Legend

1) I don't know how you can get rowcount>2 under any circumstances with your code. rowcount<2 is for the first record processed for each host_dot_PID-- and because global=false, which means that only the last two records will be looked at, whenever the host_dot_PID changes, splunk will act as if it is seeing the first record of that type. Change to global=true if you want to get the prior record for that host_dot_PID no matter how many records back it was.

2) Also, remember that splunk is processing the most recent record first, so for a record at 8:00 and another at 8:02, the one at 8:00 is going to get the rowcount=2, while the one at 8:02 is going to get rowcount=1. This probably works for your project, as long as you remember that the _time on each record will be the as/of START of the period duration. If you want the value on the later record, then start with | sort 0 _time.

3) Your coalesces do look weird, but they'll work and I don't see anything more elegant to cover past-midnight. Makes me wonder if occasionally there will be a reset and you'll end up with negative durations, though. You can finesse those with something like this..

| eval CPUSINCE=if(CPUSINCE<0,CPUSINCE%86400,CPUSINCE)
| eval ELAPSED=if(ELAPSED<0,ELAPSED%86400,ELAPSED)
0 Karma

ksh93
Explorer

I think I have worked it out: setting the streamstats window=2 and global=false is half way there, but the results seem to calculate for 0, 1 and 2 rows, which is really very counter-intuitive.
So, by adding a count() of the number of rows included in the stats calc, I can use a subsequent where clause to limit it to ONLY the rows with EXACTLY two events included in the calculation:


host=svhwm0000006pr index=os source=ps
| eval host_dot_PID=host + "." + PID
| streamstats window=2 current=true global=false count(_time) as rowcount earliest(CPUTIME) as startCPUTIME latest(CPUTIME) as endCPUTIME earliest(ELAPSED) as startELAPSED latest(ELAPSED) as endELAPSED by host_dot_PID
| where rowcount=2
| eval CPUSINCE=coalesce(strptime(endCPUTIME,"%H:%M:%S"),strptime(endCPUTIME,"%d-%H:%M:%S"))-coalesce(strptime(startCPUTIME,"%H:%M:%S"),strptime(startCPUTIME,"%d-%H:%M:%S"),0)
| eval ELAPSEDSINCE=coalesce(strptime(endELAPSED,"%H:%M:%S"),strptime(endELAPSED,"%d-%H:%M:%S"))-coalesce(strptime(startELAPSED,"%H:%M:%S"),strptime(startELAPSED,"%d-%H:%M:%S"),0)
| eval CPUAVG=100*CPUSINCE/ELAPSEDSINCE
| timechart span=5mins limit=0 avg(CPUAVG) by host_dot_PID

FYI the ps CPUTIME and ELAPSED fields have an optional %d- on the front (number of days ie over 24 hours) so to cater for these I have the ugly coalesce() evals ... let me know if you have a neater way of handling optional values for strptime().

0 Karma

DalJeanis
Legend

well, thanks for the question, because it led me to find that I've answered a bunch of questions wrong, and I'm going to have to go back and find them all.

streamstats with window= and by fieldname don't play nice together unless you
(A) sort the file before applying streamstats or (B) use global=true.

0 Karma

ksh93
Explorer

Thanks for your responses!
Yes I'll have to think about what _time I want the result represented at.
FYI those coalesces are broken under some circumstances, due to how strptime doesn't seem to initialise timeptr so the result is either relative to current time or affected by the most recent call to strptime (I can't work it out) and so yes I get several large negative values on a large dataset. I think this would be a problem with strptime() reglardless of the coalesce().
A more reliable method is to break up the string manually with substr and tonumber() each component into submission!:


host=myhostindex=os source=ps "java"
| eval host_dot_PID=host + "." + PID
| eval intCPUTIME = tonumber(substr(CPUTIME,-2,2))+tonumber(substr(CPUTIME,-5,2))60+tonumber(substr(CPUTIME,-8,2))*60*60+if(match(CPUTIME,"-"),tonumber(replace(CPUTIME,"-.",""))24*60*60,0)
| eval intELAPSED = tonumber(substr(ELAPSED,-2,2))+tonumber(substr(ELAPSED,-5,2))*60+tonumber(substr(ELAPSED,-8,2))*60*60+if(match(ELAPSED,"-"),tonumber(replace(ELAPSED,"-.
",""))*24*60*60,0)
| streamstats window=2 current=true global=false count(_time) as rowcount earliest(intCPUTIME) as startCPUTIME latest(intCPUTIME) as endCPUTIME earliest(intELAPSED) as startELAPSED latest(intELAPSED) as endELAPSED by host_dot_PID
| where rowcount=2
| eval CPUSINCE = endCPUTIME - startCPUTIME
| eval ELAPSEDSINCE = endELAPSED - startELAPSED
| eval CPUAVG=100*CPUSINCE/ELAPSEDSINCE
| timechart span=5mins limit=0 avg(CPUAVG) by host_dot_PID

0 Karma
Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...