Splunk Search

fix cpu by process search

dinisco
Explorer

The *nix app has a cpu by process search that doesn't work under certain conditions:

index="os" sourcetype="ps" host="$host$" | multikv fields pctCPU, COMMAND | timechart avg(pctCPU) by COMMAND

The problem is that if there are multiple processes running with the same command name in a single event, this will average them. So 5 x foo processes, each consuming 3% cpu returns foo=3% when it's actually 15%.

I fixed this by combining COMMAND with PID making it unique:

index="os" sourcetype="ps" host=$host$| multikv fields pctCPU, COMMAND, PID| strcat COMMAND "_" PID cmd | where pctCPU>0 | timechart avg(pctCPU) by cmd limit=0

but this is messy for systems with 50+ processes with the same COMMAND name and firefox doesn't seem to like limit=0.

Ideally I could sum pctCPU within the event for all COMMANDS of the same name. This would result in a single line on the chart for foo that shows 15% instead of 5 x lines that show foo_$pid at 3%. Is this possible?

Tags (3)
0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

You're right. This might do it:

index="os" sourcetype="ps" host="$host$" | multikv fields pctCPU, COMMAND | stats sum(pctCPU) as pctCPU by _time,COMMAND | timechart avg(pctCPU) by COMMAND

i.e., sum the CPU up for each command at each measurement (i.e. that share the same _time) before you bucket and average.

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

You're right. This might do it:

index="os" sourcetype="ps" host="$host$" | multikv fields pctCPU, COMMAND | stats sum(pctCPU) as pctCPU by _time,COMMAND | timechart avg(pctCPU) by COMMAND

i.e., sum the CPU up for each command at each measurement (i.e. that share the same _time) before you bucket and average.

dinisco
Explorer

that did it, so simple. Thank you kindly, much appreciated.

0 Karma
Get Updates on the Splunk Community!

Unlock Database Monitoring with Splunk Observability Cloud

In today’s fast-paced digital landscape, even minor database slowdowns can disrupt user experiences and stall ...

Print, Leak, Repeat: UEBA Insider Threats You Can't Ignore

Are you ready to uncover the threats hiding in plain sight? Join us for "Print, Leak, Repeat: UEBA Insider ...

Splunk MCP & Agentic AI: Machine Data Without Limits

  Discover how the Splunk Model Context Protocol (MCP) Server can revolutionize the way your organization ...