Splunk Search

Can you help us create a query for CPU usage and top 10 processes?

kenntun
Engager

I tried to get the TOP 10 CPU processes usage and the total CPU usage with the following query:

TOP 10 CPU processes

eventtype=nmon:performance type=TOP OStype="*" host="host2" Command=*
| stats max(pct_CPU) AS pct_CPU, max(logical_cpus) AS logical_cpus by _time,host,Command,PID
| stats sum(pct_CPU) As pct_CPU, last(logical_cpus) As logical_cpus by _time,host,Command
| eval key=host+":"+Command
| timechart `nmon_span` limit=10 useother=false max(value) As value by key

Total CPU usage

eventtype=nmon:performance type=CPU_ALL frameID=* host="host2"  | timechart `nmon_span` avg(cpu_load_percent) AS cpu_load_percent by host

However, there is a huge difference between the CPU usage of top ten processes and total CPU usages.
Any suggestions?

0 Karma
1 Solution

DalJeanis
Legend

1) Start by changing USEOTHER to true, so that you get the entire CPU usage. If you have dozens or hundreds of Commands, then the sum of the highest ten would not be expected to match the total.

2) To compare, you might want to bin the time span=1s and use avg(pct_cpu). CPU usage really only matters by host, so start by comparing these...

eventtype=nmon:performance type=TOP OStype="*" host="host2" Command=*
| bin _time span=1s
| stats avg(pct_CPU) AS pct_CPU by _time,host,Command,PID
| stats sum(pct_CPU) As pct_CPU by _time,host
| timechart span=15s avg(pct_CPU) AS pct_cpu


eventtype=nmon:performance type=CPU_ALL frameID=* host="host2" 
| timechart span=15s avg(cpu_load_percent) AS cpu_load_percent

We've pulled out the "by key" and "by host" in the timecharts, so that the two timecharts can be appended together and get sensible results. If the above look reasonably similar, then you can step by step add back your code until you find the difference.

View solution in original post

DalJeanis
Legend

1) Start by changing USEOTHER to true, so that you get the entire CPU usage. If you have dozens or hundreds of Commands, then the sum of the highest ten would not be expected to match the total.

2) To compare, you might want to bin the time span=1s and use avg(pct_cpu). CPU usage really only matters by host, so start by comparing these...

eventtype=nmon:performance type=TOP OStype="*" host="host2" Command=*
| bin _time span=1s
| stats avg(pct_CPU) AS pct_CPU by _time,host,Command,PID
| stats sum(pct_CPU) As pct_CPU by _time,host
| timechart span=15s avg(pct_CPU) AS pct_cpu


eventtype=nmon:performance type=CPU_ALL frameID=* host="host2" 
| timechart span=15s avg(cpu_load_percent) AS cpu_load_percent

We've pulled out the "by key" and "by host" in the timecharts, so that the two timecharts can be appended together and get sensible results. If the above look reasonably similar, then you can step by step add back your code until you find the difference.

Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...