Archive

Splunk TA for Solaris 11 doesn't give correct data

Engager

I have installed Splunk TA for Solaris 11 add-on on my solris 5.11 server. I am getting the server stats from the servers but vmstat data is not correct. It is giving -ve memory usage.

I believe the commands vmstat.sh usage are not providing the expected output in Solaris 11 servers.

Has someone also faced the same issue and managed to fix that?

0 Karma

Contributor

I wrote this app four years ago and it's no longer under active development; I don't have any solaris boxes left but I'll have a swing at it anyway!

Can you please provide the output of vmstat.sh run from the command line, and also vmstat -q 1 1 which is the relevant command that it runs. With solaris there are usually better commands than the generic ones and the main reason I wrote the TA and app in the first place was that solaris offered much better ways to get the stats in most cases, and lots of extra information that the standard NIX app just doesn't know how to get.

Perhaps vmstat itself is giving the wrong answer; perhaps it's being wrongly formatted by the app (more likely). I am suspecting a rounding error without having seen what you're getting.

This script was one where I attempted to stick with the platform-indpendent style used in the Splunk official NIX TA. Later I gave up and rewrote the scripts wholesale as it was just getting too complicated, and solaris has diverged very far from generic NIX if you really want to know what's going on.

You could most likely get much more accurate results by ditching vmstat altogether and re-implementing it with dtrace. This is no longer something I'm prepared to do.

I see one problem already though, the next command in the script, vmstat -s, should have -S as the option.

Cheers,
Charles

Engager

I fixed the problem by changing the scripts for vmstat and top inputs.

Thanks Charles for the guidance.

0 Karma

Engager

Thanks Charles. It seems to be due to different vmstat command output from Linux & Solaris servers.

Script expects a different output then coming from solaris, so it picking swap free memory as free physical memory.
I tried to explore dtrace option, but i don't seem to have required permissions. I will check with infra team on this.

I understand you don't want to spend time on this anymore, but it would be very much appreciated, if you could help me to get any direction to make chnages in vmstat.sh

0 Karma

Contributor

OK since I don't have a solaris box any more this will be a bit tricky 🙂

The part of the scrpt that executes the solaris-specific commands is here:
elif [ "x$KERNEL" = "xSunOS" ] ; then
assertHaveCommand vmstat
assertHaveCommandGivenPath /usr/sbin/swap
assertHaveCommandGivenPath /usr/sbin/prtconf
assertHaveCommand prstat
CMD='eval /usr/sbin/prtconf 2>/dev/null | grep Memory ; /usr/sbin/swap -s ; vmstat -q 1 1 ; vmstat -s ; prstat -n 1 1 1'
PARSE0='/^Memory size:/ {memTotalMB=$3} (NR==5) {memFreeMB=$5 / 1024}'
PARSE
1='(NR==2) {swapUsed=0+$(NF-3); swapFree=0+$(NF-1)}'
PARSE2='/pages paged out$/ {pgPageOut=$1} /pages swapped out$/ {pgSwapOut=$1}'
PARSE
3='/cpu context switches$/ {cSwitches=$1} /device interrupts$/ {interrupts=$1} / v?forks$/ {forks+=$1}'
PARSE4='/^Total: / {processes=$2; threads=$4; loadAvg1mi=0+$(NF-2)}'
MASSAGE="$PARSE
0 $PARSE1 $PARSE2 $PARSE3 $PARSE4 $DERIVE"

You'll see that having executed a series of commands it runs them through PARSE 1-4 and then DERIVE, which can be found elsewhere in the script. If you're getting a negative vaiue it's probably in DERIVE which the only place where subtractions are occurring. You can debug it by echoing the values seen by DERIVE and seeing which one looks wrong. It should be simple enough. Don't forget the -S change to the second vmstat.

vmstat manual link here: https://docs.oracle.com/cd/E23824_01/html/821-1462/vmstat-1m.html
HTH

0 Karma