All Apps and Add-ons

system metrics not working properly for AIX - vmstat.sh and ps.sh

jnguyen413
New Member

Hi. I am on Splunk 6.4.1 with the Unix Add-on (5.2.3). I was wondering if anyone had similar issues or solutions to get the following OS data working on AIX servers :

  1. vmstat.sh --- I get no values (see log below) and I am receiving the following message. I am not sure what is making the script not work. What do you think might be the problem?

SPLUNK LOG
memTotalMB memFreeMB memUsedMB memFreePct memUsedPct pgPageOut swapUsedPct pgSwapOut cSwitches interrupts forks processes threads loadAvg1mi waitThreads interrupts_PS pgPageIn_PS pgPageOut_PS

MANUALLY RAN
[server:splunk]/opt/splunkforwarder/etc/apps/Splunk_TA_nix/bin> ./vmstat.sh
memTotalMB memFreeMB memUsedMB memFreePct memUsedPct pgPageOut swapUsedPct pgSwapOut cSwitches interrupts forks processes threads loadAvg1mi waitThreads interrupts_PS pgPageIn_PS pgPageOut_PS
awk: The field -11 cannot be less than 0.
The input line number is 5.
The source line number is 1.

2 . ps.sh / top.sh -- I am trying to get CPU by process and ps is not cutting it. Below says that splunk is running "0.8", but its actually running around 20% on the server (through top/topas). Has anyone tried getting data through the top/topas command from AIX?

SPLUNK LOG
19136670 - 0.8 1-19:52:10 1.0 370340 347348 - A 9-20:11:23 splunkd --nodaemon_-p_8089__internal_exec_splunkd

0 Karma

rgaffur
Explorer

Here are the three issues I discovered

  1. There are 2 functions (sar & swap -s) that need higher privileges (we fixed it with RBAC)
  2. The vmstat parsing via awk was incorrect
  3. hardware.sh does not produce actionable data on our systems. I removed it

cpu.sh

sar -P ALL 1 1

vmstat.sh

elif [ "x$KERNEL" = "xAIX" ] ; then
assertHaveCommand uptime
assertHaveCommand ps
assertHaveCommand vmstat
assertHaveCommandGivenPath /usr/sbin/swap
assertHaveCommandGivenPath /usr/bin/svmon
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '
PARSE_0='NR==1 {loadAvg1mi=0+$(NF-2)} NR==2 {processes=$1} NR==3 {threads=$1-processes }'
# ps -em inclundes processes with there threads ( at least one), so processes must be excluded to count threads #
PARSE_1='(NR==4) {swapUsed=0+$(NF-5); swapFree=0+$(NF-1)} (NR==5) {pgPageIn_PS=0+$(NF-11); pgPageOut_PS=0+$(NF-10)}'
PARSE_2='/^memory / {memTotalMB=$2 / 256 ; memFreeMB=$4 / 256}'
PARSE_3='/paging space page outs$/ {pgPageOut=$1 ; pgSwapOut="?" }'
# no pgSwapOut parameter and can't be monitored in AIX (by Jacky Ho, Systex)
PARSE_4='/cpu context switches$/ {cSwitches=$1} /device interrupts$/ {interrupts=$1 ; forks="?" }'
PARSE_5='/^CPU_COUNT/ {cpuCount=$2}'
MASSAGE="$PARSE_0 $PARSE_1 $PARSE_2 $PARSE_3 $PARSE_4 $PARSE_5 $DERIVE"

This is the change I made in vmstat.sh:

    CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1; vmstat -s ; svmon; `dirname $0`/hardware.sh;'

-->
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '

The reason:

$ vmstat 1 1 | tail -1
1 0 981011 134358 0 0 0 0 0 0 12 1184 388 1 1 98 0

$ vmstat 1 1

System Configuration: lcpu=2 mem=14080MB

kthr memory page faults cpu


r b avm fre re pi po fr sr cy in sy cs us sy id wa
3 0 982403 132966 0 0 0 0 0 0 81 23250 1299 18 17 64 0
,Here are the three issues I discovered while running on AIX 7.1

  1. There are 2 functions (sar & swap -s) that need higher privileges (we fixed it with RBAC)
  2. The vmstat parsing via awk was incorrect
  3. hardware.sh does not produce actionable data on our systems. I removed it

cpu.sh

sar -P ALL 1 1

vmstat.sh

elif [ "x$KERNEL" = "xAIX" ] ; then
assertHaveCommand uptime
assertHaveCommand ps
assertHaveCommand vmstat
assertHaveCommandGivenPath /usr/sbin/swap
assertHaveCommandGivenPath /usr/bin/svmon
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '
PARSE_0='NR==1 {loadAvg1mi=0+$(NF-2)} NR==2 {processes=$1} NR==3 {threads=$1-processes }'
# ps -em inclundes processes with there threads ( at least one), so processes must be excluded to count threads #
PARSE_1='(NR==4) {swapUsed=0+$(NF-5); swapFree=0+$(NF-1)} (NR==5) {pgPageIn_PS=0+$(NF-11); pgPageOut_PS=0+$(NF-10)}'
PARSE_2='/^memory / {memTotalMB=$2 / 256 ; memFreeMB=$4 / 256}'
PARSE_3='/paging space page outs$/ {pgPageOut=$1 ; pgSwapOut="?" }'
# no pgSwapOut parameter and can't be monitored in AIX (by Jacky Ho, Systex)
PARSE_4='/cpu context switches$/ {cSwitches=$1} /device interrupts$/ {interrupts=$1 ; forks="?" }'
PARSE_5='/^CPU_COUNT/ {cpuCount=$2}'
MASSAGE="$PARSE_0 $PARSE_1 $PARSE_2 $PARSE_3 $PARSE_4 $PARSE_5 $DERIVE"

Here was the change
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1; svmon; dirname $0/hardware.sh;'
-->
CMD='eval uptime ; ps -e | wc -l ; ps -em | wc -l ; /usr/sbin/swap -s ; vmstat 1 1 | tail -1 ; vmstat -s ; svmon; '

sloshburch
Splunk Employee
Splunk Employee

This might be a known issue ADDON-14093. Please open a support ticket so they can validate if this is config related or related to ADDON-14093. Make sure to include what version of AIX because I vaguely recall this might have to do with changes in newer AIX versions. Also, include the link to this post in case there's new details shared here.

0 Karma

jmantor
Path Finder

Since these doesn't seem to be going anywhere, I'm logging a support ticket.

0 Karma

praphulla1
Path Finder

can you share the solution if you received it from Splunk or can you share the ticket for further followup ?

0 Karma

jmantor
Path Finder

I logged case #561504. I wasn't too pleased when it was re-classed as an enhancement request.
I'm not holding my breath that we'll see this addressed anytime soon ; (

0 Karma

praphulla1
Path Finder

Thanks for your quick reply. Let me try to reach them Splunk representative.

0 Karma

guilmxm
SplunkTrust
SplunkTrust

Hi,

You should have a look at:

https://splunkbase.splunk.com/app/1753/

Cheers

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...