Archive

*nix or Splunk_TA_nix apps vmstat.sh giving incorect free memory rhel 5 & 6

Explorer

The free memory coming out of the vmstat.sh script in both the *nix app and the TA_nix apps is not giving correct free memory. I can fix the scripts to give accurate information, but was wondering if anyone else had seen this and had to manually change it?

Here is the code block from the vmstat.sh for Linux:

DERIVE='END {memUsedMB=memTotalMB-memFreeMB; memUsedPct=(100.0*memUsedMB)/memTotalMB; memFreePct=100.0-memUsedPct; swapUsedPct=swapUsed ? (100.0*swapUsed)/(swapUsed+swapFree) : 0}'

if [ "x$KERNEL" = "xLinux" ] ; then
        assertHaveCommand uptime
        assertHaveCommand ps
        assertHaveCommand vmstat
        CMD='eval uptime ; ps -e | wc -l ; ps -eT | wc -l ; vmstat -s'
        PARSE_0='NR==1 {loadAvg1mi=0+$(NF-2)} NR==2 {processes=$1} NR==3 {threads=$1}'
        PARSE_1='/total memory$/ {memTotalMB=$1/1024} /free memory$/ {memFreeMB+=$1/1024} /buffer memory$/ {memFreeMB+=$1/1024} /swap cache$/ {memFreeMB+=$1/1024}'
        PARSE_2='/pages paged out$/ {pgPageOut=$1} /used swap$/ {swapUsed=$1} /free swap$/ {swapFree=$1} /pages swapped out$/ {pgSwapOut=$1}'
        PARSE_3='/interrupts$/ {interrupts=$1} /CPU context switches$/ {cSwitches=$1} /forks$/ {forks=$1}'
        MASSAGE="$PARSE_0 $PARSE_1 $PARSE_2 $PARSE_3 $DERIVE"

Here is the output from that on the cli:

./vmstat.sh

memTotalMB memFreeMB memUsedMB memFreePct memUsedPct pgPageOut swapUsedPct pgSwapOut cSwitches interrupts forks processes threads loadAvg1mi

129181 122331 6850 94.7 5.3 83083638 0.0 5 2474094895 582278992 124945211 791 1270 0.09

And output from just vmstat -s (which is what the script is using)

  vmstat -s
        132282320  total memory
        112188224  used memory
         67357104  active memory
         40568500  inactive memory
         20094096  free memory
           251012  buffer memory
        104906704  swap cache
        134119416  total swap
               20  used swap<br>
        134119392  free swap<br>
         63568055 non-nice user cpu ticks<br>
              439 nice user cpu ticks<br>
         27415594 system cpu ticks<br>
       1796620745 idle cpu ticks<br>
         44553858 IO-wait cpu ticks<br>
            19458 IRQ cpu ticks<br>
          2447076 softirq cpu ticks<br>
                0 stolen cpu ticks<br>
         76806178 pages paged in<br>
         83103846 pages paged out<br>
                0 pages swapped in<br>
                5 pages swapped out<br>
        583605396 interrupts<br>
       2475721333 CPU context switches<br>
       1377403815 boot time<br>
        124948563 forks<br>

Here is output from free:

free
             total       used       free     shared    buffers     cached
Mem:     132282320  112170712   20111608          0     251164  104907988
-/+ buffers/cache:    7011560  125270760
Swap:    134119416         20  134119396

I can see the math going on in the DERIVE variable, but can't make sense of it at all. I was trying to run historical analysis on some Oracle DB servers that were having issues and it shows that we have 93% ram free in in the *NIX app which is obviously wrong. Luckily we have munin and nagios, but it would have been nice to get this all out of Splunk.

Just curious what everyone else is doing with this?

0 Karma

Explorer

I see now what it is doing after spending some time troubleshooting these boxes. It is taking the cached memory and deleting it from the used, since it does not count this as "used memory." I don't know if this is the best method really since a lot of admins would like to know the memory is being cached.

0 Karma