All Apps and Add-ons

CPU Load on AIX - getting different values from TA-nmon (events), TA-metricator-for-nmon (metrics) for the same host

alexeyglukhov
Path Finder

Hello all,

Two Nmon TAs on one AIX host.
TA-nmon (based on events) showing CPU load higher than TA-metricator-for-nmon (based on metrics).

I need help to figure out what might cause this.

SPL on data from TA-nmon (events):

| tstats 
avg(CPU.UPTIME.load_average_1min) AS load_average_1min 
avg(CPU.UPTIME.load_average_5min) AS load_average_5min 
avg(CPU.UPTIME.load_average_15min) AS load_average_15min 
from datamodel=NMON_Data_CPU 
where (nodename = CPU.UPTIME) host= 
groupby _time host prestats=true `nmon_span`
| timechart `nmon_span` 
avg(CPU.UPTIME.load_average_1min) AS load_average_1min 
avg(CPU.UPTIME.load_average_5min) AS load_average_5min 
avg(CPU.UPTIME.load_average_15min) AS load_average_15min

SPL on data from TA-metricator-for-nmon (metrics):

| mstats avg(_value) as value where `nmon_metrics_index` 
(metric_name=os.unix.nmon.system.uptime.load_average_1min OR 
metric_name=os.unix.nmon.system.uptime.load_average_5min OR 
metric_name=os.unix.nmon.system.uptime.load_average_15min) 
host= by metric_name `nmon_span` 
| `extract_metrics("load_average_1min load_average_5min load_average_15min")`
| fillnull value=0 load_average_1min load_average_5min load_average_15min 
| timechart `nmon_span` 
avg(load_average_1min) as load_average_1min 
avg(load_average_5min) as load_average_5min 
avg(load_average_15min) as load_average_15min

alt text

Update:
Continuing my investigation.
Source is output from "uptime" command which is exactly the same for both events and metrics TAs.

From
/opt/splunkforwarder/etc/apps/(TA-metricator-for-nmon OR TA-nmon)/bin/nmon_external_cmd/nmon_external_snap.sh

# Uptime information (uptime command output)
echo "UPTIME,$1,\"`uptime | sed 's/^\s//g' | sed 's/,/;/g'`\"" >>NMON_FIFO_PATH/nmon_external.dat &

Turns out that data based on events is accurate, but metric values are exactly 3 times less (attached screenshot).
alt text

0 Karma
1 Solution

Melstrathdee
Path Finder

Hey Alexey,
Remove live 7 in your SPL

fillnull value=0 load_average_1min load_average_5min load_average_15min

The fillnull is splitting it into 3 events and then the timechart is generating the average across these ( as two of them have a 0 value it is essentially dividing by 3).

View solution in original post

0 Karma

Melstrathdee
Path Finder

Hey Alexey,
Remove live 7 in your SPL

fillnull value=0 load_average_1min load_average_5min load_average_15min

The fillnull is splitting it into 3 events and then the timechart is generating the average across these ( as two of them have a 0 value it is essentially dividing by 3).

0 Karma

alexeyglukhov
Path Finder

Indeed, this fixed the problem. Thank you very much, Mel !
So, Metricator for NMON -> "UPTIME Load Average" panel's SPL is to be slightly corrected, working version:

| mstats avg(_value) as value where `nmon_metrics_index` 
 (metric_name=os.unix.nmon.system.uptime.load_average_1min OR 
 metric_name=os.unix.nmon.system.uptime.load_average_5min OR 
 metric_name=os.unix.nmon.system.uptime.load_average_15min) 
 host= by metric_name `nmon_span` 
 | `extract_metrics("load_average_1min load_average_5min load_average_15min")`
 | timechart `nmon_span` 
 avg(load_average_1min) as load_average_1min 
 avg(load_average_5min) as load_average_5min 
 avg(load_average_15min) as load_average_15min
0 Karma

to4kawa
Ultra Champion

SPL on data from TA-nmon (events):

add span in tstats

0 Karma

alexeyglukhov
Path Finder

Hi @to4kawa
Thanks for the suggestion, my aim is to migrate from event based dashboards to metrics based, that's why I am currently checking if metrics are providing exactly the same data as events.

0 Karma
Get Updates on the Splunk Community!

Using Machine Learning for Hunting Security Threats

WATCH NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more for ...

New Learning Videos on Topics Most Requested by You! Plus This Month’s New Splunk ...

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...

How I Instrumented a Rust Application Without Knowing Rust

As a technical writer, I often have to edit or create code snippets for Splunk's distributions of ...