All Apps and Add-ons
Highlighted

system metrics not working properly for AIX - vmstat.sh and ps.sh

New Member

Hi. I am on Splunk 6.4.1 with the Unix Add-on (5.2.3). I was wondering if anyone had similar issues or solutions to get the following OS data working on AIX servers :

  1. vmstat.sh --- I get no values (see log below) and I am receiving the following message. I am not sure what is making the script not work. What do you think might be the problem?

SPLUNK LOG
memTotalMB memFreeMB memUsedMB memFreePct memUsedPct pgPageOut swapUsedPct pgSwapOut cSwitches interrupts forks processes threads loadAvg1mi waitThreads interruptsPS pgPageInPS pgPageOut_PS

MANUALLY RAN
[server:splunk]/opt/splunkforwarder/etc/apps/SplunkTAnix/bin> ./vmstat.sh
memTotalMB memFreeMB memUsedMB memFreePct memUsedPct pgPageOut swapUsedPct pgSwapOut cSwitches interrupts forks processes threads loadAvg1mi waitThreads interruptsPS pgPageInPS pgPageOut_PS
awk: The field -11 cannot be less than 0.
The input line number is 5.
The source line number is 1.

2 . ps.sh / top.sh -- I am trying to get CPU by process and ps is not cutting it. Below says that splunk is running "0.8", but its actually running around 20% on the server (through top/topas). Has anyone tried getting data through the top/topas command from AIX?

SPLUNK LOG
19136670 - 0.8 1-19:52:10 1.0 370340 347348 - A 9-20:11:23 splunkd --nodaemon-p8089_internalexec_splunkd

0 Karma
Highlighted

Re: system metrics not working properly for AIX - vmstat.sh and ps.sh

SplunkTrust
SplunkTrust

Hi,

You should have a look at:

https://splunkbase.splunk.com/app/1753/

Cheers

0 Karma
Highlighted

Re: system metrics not working properly for AIX - vmstat.sh and ps.sh

Ultra Champion

This might be a known issue ADDON-14093. Please open a support ticket so they can validate if this is config related or related to ADDON-14093. Make sure to include what version of AIX because I vaguely recall this might have to do with changes in newer AIX versions. Also, include the link to this post in case there's new details shared here.

0 Karma
Highlighted

Re: system metrics not working properly for AIX - vmstat.sh and ps.sh

Path Finder

Since these doesn't seem to be going anywhere, I'm logging a support ticket.

0 Karma
Highlighted

Re: system metrics not working properly for AIX - vmstat.sh and ps.sh

Path Finder

can you share the solution if you received it from Splunk or can you share the ticket for further followup ?

0 Karma
Highlighted

Re: system metrics not working properly for AIX - vmstat.sh and ps.sh

Path Finder

I logged case #561504. I wasn't too pleased when it was re-classed as an enhancement request.
I'm not holding my breath that we'll see this addressed anytime soon ; (

0 Karma
Highlighted

Re: system metrics not working properly for AIX - vmstat.sh and ps.sh

Path Finder

Thanks for your quick reply. Let me try to reach them Splunk representative.

0 Karma
Highlighted

Re: system metrics not working properly for AIX - vmstat.sh and ps.sh

Explorer

Here are the three issues I discovered

  1. There are 2 functions (sar & swap -s) that need higher privileges (we fixed it with RBAC)
  2. The vmstat parsing via awk was incorrect
  3. hardware.sh does not produce actionable data on our systems. I removed it

cpu.sh

sar -P ALL 1 1

vmstat.sh

elif [ "x$KERNEL" = "xAIX" ] ; then
assertHaveCommand uptime
assertHaveCommand ps
assertHaveCommand vmstat
assertHaveCommandGivenPath /usr/sbin/swap
assertHaveCommandGivenPath /usr/bin/svmon
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '
PARSE0='NR==1 {loadAvg1mi=0+$(NF-2)} NR==2 {processes=$1} NR==3 {threads=$1-processes }'
# ps -em inclundes processes with there threads ( at least one), so processes must be excluded to count threads #
PARSE
1='(NR==4) {swapUsed=0+$(NF-5); swapFree=0+$(NF-1)} (NR==5) {pgPageInPS=0+$(NF-11); pgPageOutPS=0+$(NF-10)}'
PARSE2='/^memory / {memTotalMB=$2 / 256 ; memFreeMB=$4 / 256}'
PARSE
3='/paging space page outs$/ {pgPageOut=$1 ; pgSwapOut="?" }'
# no pgSwapOut parameter and can't be monitored in AIX (by Jacky Ho, Systex)
PARSE4='/cpu context switches$/ {cSwitches=$1} /device interrupts$/ {interrupts=$1 ; forks="?" }'
PARSE
5='/^CPUCOUNT/ {cpuCount=$2}'
MASSAGE="$PARSE
0 $PARSE1 $PARSE2 $PARSE3 $PARSE4 $PARSE_5 $DERIVE"

This is the change I made in vmstat.sh:

    CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1; vmstat -s ; svmon; `dirname $0`/hardware.sh;'

-->
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '

The reason:

$ vmstat 1 1 | tail -1
1 0 981011 134358 0 0 0 0 0 0 12 1184 388 1 1 98 0

$ vmstat 1 1

System Configuration: lcpu=2 mem=14080MB

kthr memory page faults cpu


r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 0 982403 132966 0 0 0 0 0 0 81 23250 1299 18 17 64 0
,Here are the three issues I discovered while running on AIX 7.1

  1. There are 2 functions (sar & swap -s) that need higher privileges (we fixed it with RBAC)
  2. The vmstat parsing via awk was incorrect
  3. hardware.sh does not produce actionable data on our systems. I removed it

cpu.sh

sar -P ALL 1 1

vmstat.sh

elif [ "x$KERNEL" = "xAIX" ] ; then
assertHaveCommand uptime
assertHaveCommand ps
assertHaveCommand vmstat
assertHaveCommandGivenPath /usr/sbin/swap
assertHaveCommandGivenPath /usr/bin/svmon
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '
PARSE0='NR==1 {loadAvg1mi=0+$(NF-2)} NR==2 {processes=$1} NR==3 {threads=$1-processes }'
# ps -em inclundes processes with there threads ( at least one), so processes must be excluded to count threads #
PARSE
1='(NR==4) {swapUsed=0+$(NF-5); swapFree=0+$(NF-1)} (NR==5) {pgPageInPS=0+$(NF-11); pgPageOutPS=0+$(NF-10)}'
PARSE2='/^memory / {memTotalMB=$2 / 256 ; memFreeMB=$4 / 256}'
PARSE
3='/paging space page outs$/ {pgPageOut=$1 ; pgSwapOut="?" }'
# no pgSwapOut parameter and can't be monitored in AIX (by Jacky Ho, Systex)
PARSE4='/cpu context switches$/ {cSwitches=$1} /device interrupts$/ {interrupts=$1 ; forks="?" }'
PARSE
5='/^CPUCOUNT/ {cpuCount=$2}'
MASSAGE="$PARSE
0 $PARSE1 $PARSE2 $PARSE3 $PARSE4 $PARSE_5 $DERIVE"

Here was the change
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1; svmon; dirname $0/hardware.sh;'
-->
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '

Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.