All Apps and Add-ons

Why is cpu.sh script being indexed incorrectly in Splunk App for Unix and Linux (Splunk_TA_nix)?

neiljpeterson
Communicator

The output from both the script and the command the script runs is correct

server# sar -P ALL 1 1  | awk 'BEGIN {print "CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle"} /Average|Linux|^$|%/ {next} (NR==1) {next} {cpu=$3; pctUser=$4; pctNice=$5; pctSystem=$6; pctIowait=$7; pctIdle=$NF} {printf "%-3s  %9s  %9s  %9s  %9s  %9s\n", cpu, pctUser, pctNice, pctSystem, pctIowait, pctIdle}' header="CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle"

CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle
all       6.31       0.00       0.06       0.00      93.63
0         0.99       0.00       0.00       0.00      99.01
1       100.00       0.00       0.00       0.00       0.00
2         0.00       0.00       0.00       0.00     100.00
3         0.00       0.00       0.00       0.00     100.00
4         0.00       0.00       0.00       0.00     100.00
5         0.00       0.00       0.00       0.00     100.00
6         0.00       0.00       0.00       0.00     100.00
7         0.00       0.00       0.00       0.00     100.00
8         0.99       0.00       0.00       0.00      99.01
9         0.99       0.00       0.00       0.00      99.01
10        0.00       0.00       0.00       0.00     100.00
11        0.00       0.00       0.00       0.00     100.00
12        0.00       0.00       1.00       0.00      99.00
13        0.00       0.00       0.00       0.00     100.00
14        0.00       0.00       0.00       0.00     100.00
15        0.00       0.00       0.00       0.00     100.00

However the indexed result seems off. The number of fields is correct, and the field headers are correct. But the values for the CPU field are incorrect. The CPU column values look like they belong in pctUser column (compare the first three rows to those above) I modified the script so the command would also be indexed.

**Note that the header below is not included in the indexed source, I included it for clarity**
CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle
6.31       0.00       0.00       0.00       0.00      93.69
1.00       0.00       0.00       0.00       0.00      99.00
100.00       0.00       0.00       0.00       0.00       0.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
1.00       0.00       0.00       0.00       0.00      99.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
0.00       0.00       0.00       0.00       0.00     100.00
Cmd = [sar -P ALL 1 1];  | awk 'BEGIN {print "CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle"} /Average|Linux|^$|%/ {next} (NR==1) {next} {cpu=$3; pctUser=$4; pctNice=$5; pctSystem=$6; pctIowait=$7; pctIdle=$NF} {printf "%-3s  %9s  %9s  %9s  %9s  %9s\n", cpu, pctUser, pctNice, pctSystem, pctIowait, pctIdle}' header="CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle"

What do you think is going on here?

twollenslegel_s
Splunk Employee
Splunk Employee

This is now listed as an issue with a work around for the TA.

http://docs.splunk.com/Documentation/UnixAddOn/5.2.4/User/Releasenotes

0 Karma

fenrisdacat
Explorer

I know this is an ancient question but here is the issue.

When you run the script locally its taking your "locale" into consideration and outputing something like this from sar:

08:28:37 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
08:28:38 PM     all     11.14      0.00      0.70      0.00      0.00     88.15
08:28:38 PM       0     98.99      0.00      1.01      0.00      0.00      0.00

When splunk runs it sar is using a different time format, and more than likely its using POSIX so your output above transforms to

20:28:37     CPU     %user     %nice   %system   %iowait    %steal     %idle
20:28:38     all     11.14      0.00      0.70      0.00      0.00     88.15
20:28:38       0     98.99      0.00      1.01      0.00      0.00      0.00

So then when the script takes that output and feeds it to AWK, the placement of the values is off by one because of the absence of the AM/PM designation.

The Fix:
You can modify the user's shell splunkd runs under and add LC_TIME=en_US (or some other locale that adds AM/PM).
or
Add LC_TIME=en_US to the last line in the script before $CMD
LC_TIME=en_US $CMD | tee $TEE_DEST | $AWK "$HEADERIZE $FILTER $FORMAT $PRINTF" header="$HEADER"

0 Karma

polymorphic
Communicator

I found a solution for my Ubuntu installation!
However i did not find the reason 😞

Here it goes...
I found that everything looks good as long as the data is collected by the sar command, but for some unknown reason the sar command fails after some time and instead the mpstat command is used, as the cpu.sh script proposes.
The problem however is that the output from 'sar -P ALL 1 1' and 'mpstat -P ALL 1 1' in my ubuntu and in my debian installation isnt as expected in the cpu.sh script.

So my solution was to never user the sar command and always use the mpstat command instead. And change the FORMAT part to suit the actual output:

--

snip

if [ "x$KERNEL" = "xLinux" ] ; then
    queryHaveCommand sar
    FOUND_SAR=$?
    queryHaveCommand mpstat
    FOUND_MPSTAT=$?
#    if [ $FOUND_SAR -eq 0 ] ; then
#        CMD='sar -P ALL 1 1'
#        FORMAT='{cpu=$3; pctUser=$4; pctNice=$5; pctSystem=$6; pctIowait=$7; pctIdle=$NF}'
#       FORMAT='{cpu=$NF-7; pctUser=$NF-6; pctNice=$NF-5; pctSystem=$NF-4; pctIowait=$NF-1; pctIdle=$NF}'
    if [ $FOUND_MPSTAT -eq 0 ] ; then
        CMD='mpstat -P ALL 1 1'
#        FORMAT='{cpu=$(NF-9); pctUser=$(NF-8); pctNice=$(NF-7); pctSystem=$(NF-6); pctIowait=$(NF-5); pctIdle=$(NF-1)}'
        FORMAT='{cpu=$(NF-10); pctUser=$(NF-9); pctNice=$(NF-8); pctSystem=$(NF-7); pctIowait=$(NF-6); pctIdle=$(NF-1)}'
    else
        failLackMultipleCommands sar mpstat
    fi
    FILTER='/Average|Linux|^$|%/ {next} (NR==1) {next}' elif [ "x$KERNEL" = "xSunOS" ] ; then

--

snip

edit:
I discovered that the data retrived wasnt correct, so i had to edit the FORMAT line again. These works for 'mpstat -P ALL 1 1'
For my Ubuntu 14.04:

FORMAT='{cpu=$(NF-10); pctUser=$(NF-9); pctNice=$(NF-8); pctSystem=$(NF-7); pctIowait=$(NF-6); pctIdle=$(NF)}'

and for my Deban 7.8:

FORMAT='{cpu=$(NF-9); pctUser=$(NF-8); pctNice=$(NF-7); pctSystem=$(NF-6); pctIowait=$(NF-5); pctIdle=$(NF)}'

I believe you have to make your own format line that fits your distribution/version.

pkeller
Contributor

I also have the same problem. And I have a Splunk ticket open on it, but as of yet, no solution. And to further convolute the issue, a restart does work ... but if the restart is initiated via a deployment-server reload, the problem isn't corrected. But ... if a restart is issued 'outside' of splunk ... via 'sudo -u splunk_user $SPLUNK_HOME/bin/splunk restart' OR 'sudo /sbin/service splunk_service_name restart' then, the problem does clear up (until it resurfaces again.

But, if splunkd itself initiates the restart ... nothing changes.

0 Karma

polymorphic
Communicator

I have the exact same problem, however when i restart Splunk i discovered that the CPU numbering are working correctly for a period, but after a while, the problem reappears.

Restart of Splunk, and everything is ok again.

0 Karma

fleXible
Explorer

I have the very same problem and I can't get it fixed.

From my debugging, it seems to be a problem with awk behaving differently between versions:

Buggy host:
./cpu.sh --debug
CPU pctUser pctNice pctSystem pctIowait pctIdle
1.00 0.00 0.00 0.00 0.00 99.00
1.00 0.00 0.00 0.00 0.00 99.00

Linux 3.16.0-4-686-pae (myhost.domain.de) 07/23/15 i686 (1 CPU)

04:58:15 CPU %user %nice %system %iowait %steal %idle
04:58:16 all 1.00 0.00 0.00 0.00 0.00 99.00
04:58:16 0 1.00 0.00 0.00 0.00 0.00 99.00

Average: CPU %user %nice %system %iowait %steal %idle
Average: all 1.00 0.00 0.00 0.00 0.00 99.00
Average: 0 1.00 0.00 0.00 0.00 0.00 99.00
Cmd = [sar -P ALL 1 1]; | awk 'BEGIN {print "CPU pctUser pctNice pctSystem pctIowait pctIdle"} /Average|Linux|^$|%/ {next} (NR==1) {next} {cpu=$3; pctUser=$4; pctNice=$5; pctSystem=$6; pctIowait=$7; pctIdle=$NF} {printf "%-3s %9s %9s %9s %9s %9s\n", cpu, pctUser, pctNice, pctSystem, pctIowait, pctIdle}' header="CPU pctUser pctNice pctSystem pctIowait pctIdle"

Running it manually in the shell works, though:
myhost:/opt/splunkforwarder/etc/apps/Splunk_TA_nix/bin# sar -P ALL 1 1 | awk 'BEGIN {print "CPU pctUser pctNice pctSystem pctIowait pctIdle"} /Average|Linux|^$|%/ {next} (NR==1) {next} {cpu=$3; pctUser=$4; pctNice=$5; pctSystem=$6; pctIowait=$7; pctIdle=$NF} {printf "%-3s %9s %9s %9s %9s %9s", cpu, pctUser, pctNice, pctSystem, pctIowait, pctIdle}' header="CPU pctUser pctNice pctSystem pctIowait pctIdle"
CPU pctUser pctNice pctSystem pctIowait pctIdle
all 0.00 0.00 1.00 0.00 99.000 0.00 0.00 1.00 0.00 99.00

myhost:/opt/splunkforwarder/etc/apps/Splunk_TA_nix/bin# awk -W version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

compiled limits:
max NF 32767
sprintf buffer 1020

Working host:
[root|/opt/splunk/etc/apps/Splunk_TA_nix/bin] ./cpu.sh --debug
CPU pctUser pctNice pctSystem pctIowait pctIdle
all 7.58 0.00 0.88 41.79 49.75
0 3.12 0.00 1.04 10.42 85.42

[root|/opt/splunk/etc/apps/Splunk_TA_nix/bin] cat debug--cpu.sh--Thu_Jul_23_05-03-52_CEST_2015
Linux 3.13.0-57-generic (host2.domain.de) 07/23/2015 x86_64 (1 CPU)

05:03:52 AM CPU %user %nice %system %iowait %steal %idle
05:03:53 AM all 7.58 0.00 0.88 41.79 0.00 49.75
05:03:53 AM 0 3.12 0.00 1.04 10.42 0.00 85.42

Average: CPU %user %nice %system %iowait %steal %idle
Average: all 7.58 0.00 0.88 41.79 0.00 49.75
Average: 0 3.12 0.00 1.04 10.42 0.00 85.42
Cmd = [sar -P ALL 1 1]; | awk 'BEGIN {print "CPU pctUser pctNice pctSystem pctIowait pctIdle"} /Average|Linux|^$|%/ {next} (NR==1) {next} {cpu=$3; pctUser=$4; pctNice=$5; pctSystem=$6; pctIowait=$7; pctIdle=$NF} {printf "%-3s %9s %9s %9s %9s %9s
", cpu, pctUser, pctNice, pctSystem, pctIowait, pctIdle}' header="CPU pctUser pctNice pctSystem pctIowait pctIdle"

correct output with shell script, but incorrect from command line:
[root|/opt/splunk/etc/apps/Splunk_TA_nix/bin] sar -P ALL 1 1 | awk 'BEGIN {print "CPU pctUser pctNice pctSystem pctIowait pctIdle"} /Average|Linux|^$|%/ {next} (NR==1) {next} {cpu=$3; pctUser=$4; pctNice=$5; pctSystem=$6; pctIowait=$7; pctIdle=$NF} {printf "%-3s %9s %9s %9s %9s %9s", cpu, pctUser, pctNice, pctSystem, pctIowait, pctIdle}' header="CPU pctUser pctNice pctSystem pctIowait pctIdle"
CPU pctUser pctNice pctSystem pctIowait pctIdle
all 2.52 0.00 0.38 0.50 96.600 1.04 0.00 1.04 2.08 95.831 0.98 0.00 1.96 1.96 94.122

[root|/opt/splunk/etc/apps/Splunk_TA_nix/bin] awk --version [bernd]
GNU Awk 4.0.1
Copyright (C) 1989, 1991-2012 Free Software Foundation.

Any ideas on how to fix this?

0 Karma

jcoates_splunk
Splunk Employee
Splunk Employee

I'm going to try to guess at an answer to this so that it finally goes away from my list of unanswered questions... It could be a permissions or environment issue caused by running the command as whatever user Splunk is running as.
Example 1: root user > sh > sar > awk > splunkd
Example 2: Splunk user > sh > sar > awk > splunkd

0 Karma

shou
Explorer

I've got the same problem. Found that restarting the forwarder tends to help, but not fully resolve the issue. For example, this morning we found a system doing this. Restarted the forwarder, and now about 50% of the events are right, and 50% are wrong...

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...