All Apps and Add-ons
Highlighted

NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

Path Finder

I have 17 AIX systems reporting into the NMON Performance Monitor for Unix and Linux Systems. I can see all the inventory data, but none of the performance data is populating into the application. At one point and time, it was working, so any ideas on how to troubleshoot this?

0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

SplunkTrust
SplunkTrust

Hello,

Could you please follow the trouble shooting guide:

http://nmon-for-splunk.readthedocs.io/en/latest/Userguide.html#troubleshooting-guide-from-a-to-z

This should guide you to the root cause of your issue.

You can contact me through the application main page.

Regards,

0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

Path Finder

ok after looking through everything I see a series of these errors:

11-23-2016 10:28:26.439 -0800 INFO  TailReader - Archive file='/opt/splunkforwarder/var/log/nmon/var/nmon_repository/wasacpt01_161123_0854.nmon' has stopped changing, will read it now.

It doesn't look like it ever stops changing...

11-23-2016 10:29:17.774 -0800 INFO  TailReader - Archive file='/opt/splunkforwarder/var/log/nmon/var/nmon_repository/wasacpt01_161123_0854.nmon' updated less than 10000ms ago, will not read it until it stops changing. File size=716204

It never processes the data...any suggestions?

0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

SplunkTrust
SplunkTrust

I see.

For some reasons Nmon might take too much time to complete a run, and/or files will change too fast for Splunk to manage them.

You 2 ways to solve the situation:

  • You lower the value in seconds between 2 nmon cycles, the default is 60 seconds which apparently in your case is too fast.

This increase the time between 2 nmon cycles (known as the INTERVAL value), follow these instructions:

[http://nmon-for-splunk.readthedocs.io/en/latest/Userguide.html#manage-the-volume-of-data-generated-b...]

In a few words, all you have to is creating your own local/nmon.conf and set the according value

  • Use the TA-nmon_selfmode instead of the standard TA-nmon, it is an alternative version of the TA-nmon which does not use Splunk file monitoring.

It won't be affected by the issue you have, you will it in the same resources directory of the core application.

Please let me know.

0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

Path Finder

I created my own nmon.conf file specifically for 12 servers:

root@dply1.splunk (Linux) $ more nmon.conf
# nmon.conf

# This configuration file will set the interval and snapshot values when starting up the nmon binary
# It is being sourced by the nmon_helper.sh script during script startup

# *** FILE ENCODING: UTF-8 ! ***
# When creating a local/nmon.conf, pay attention to file encoding specially when working under Windows.
# The file must be UTF-8 encoded or you may run in trouble.

### NMON COLLECT OPTIONS ###

# The nmon_helper.sh input script is set by default to run every 60 seconds
# If Nmon is not running, the script will start Nmon using the configuration above

# The default mode for Nmon data generation is set to "longperiod_low" which is the most preservative mode to limit the CPU usage due the Nmon/Splunk processing
steps
# Feel free to test available modes or custom mode to set better options that answer your needs and requirements

# The "longperiod_high" mode is a good compromise between accuracy, CPU / licensing cost and operational intelligence, and should relevant for very large deploym
ent in Production environments

# Available modes for proposal below:

#   shortperiod_low)
#           interval="60"
#           snapshot="10"

#   shortperiod_middle)
#           interval="30"
#           snapshot="20"

#   shortperiod_high)
#           interval="20"
#           snapshot="30"

#   longperiod_low)
#           interval="240"
#           snapshot="120"

#   longperiod_middle)
#           interval="120"
#           snapshot="120"

#   longperiod_high)
#           interval="60"
#           snapshot="120"

# Benchmarking of January 2015 with Version 1.5.0 shows that:

# longperiod_middle --> CPU usage starts to notably increase after 4 hours of Nmon runtime


# custom --> Set a custom interval and snapshot value, if unset short default values will be used (see custom_interval and custom_snapshot)

# Default is longperiod_high
mode="custom"

# Refresh interval in seconds, Nmon will use this value to refresh data each X seconds
# UNUSED IF NOT SET TO custom MODE
custom_interval="180"

# Number of Data refresh occurrences, Nmon will refresh data X times
# UNUSED IF NOT SET TO custom MODE
custom_snapshot="360"

### VARIOUS COMMON OPTIONS ###

# Time in seconds of margin before running a new iteration of Nmon process to prevent data gaps between 2 iterations of Nmon
# the nmon_helper.sh script will spawn a new Nmon process when the age in seconds of the current process gets higher than this value

# The endtime is evaluated the following way:
# endtime=$(( ${interval} * ${snapshot} - ${endtime_margin} ))

# When the endtime gets higher than the endtime_margin, a new Nmon process will be spawned
# default value to 240 seconds which will start a new process 4 minutes before the current process ends

# Setting this value to "0" will totally disable this feature

endtime_margin="240"

### NFS OPTIONS ###

# Change to "1" to activate NFS V2 / V3 (option -N) for AIX hosts
AIX_NFS23="0"

# Change to "1" to activate NFS V4 (option -NN) for AIX hosts
AIX_NFS4="0"

# Change to "1" to activate NFS V2 / V3 / V4 (option -N) for Linux hosts
# Note: Some versions of Nmon introduced a bug that makes Nmon to core when activating NFS, ensure your version is not outdated
Linux_NFS="0"

### LINUX OPTIONS ###

# Change the priority applied while looking at nmon binary
# by default, the nmon_helper.sh script will use any nmon binary found in PATH
# Set to "1" to give the priority to embedded nmon binaries
# Note: Since release 1.6.07, priority is given by default to embedded binaries
Linux_embedded_nmon_priority="1"

# Change the limit for processes and disks capture of nmon for Linux
# In default configuration, nmon will capture most of the process table by capturing main consuming processes
# This function is percentage limit of CPU time, with a default limit of 0.01
# Changing this value can influence the volume of data to be generated, and the associated CPU overhead for that data to be parsed

# Possible values are:
# Linux_unlimited_capture="0" --> Default nmon behavior, capture main processes (no -I option)
# Linux_unlimited_capture="-1" --> Set the capture mode to unlimited (-I -1)
# Linux_unlimited_capture="x.xx" --> Set the percentage limit to a custom value, ex: "0.01" will set "-I 0.01"
Linux_unlimited_capture="0"

# Set the maximum number of devices collected by Nmon, default is set to 1500 devices
# This option will be ignored if you set the Linux_unlimited_capturation below.
# Increase this value if you have systems with more devices
# Up to 3000 devices will be taken in charge by the Application (hard limit in nmon2csv.py / nmon2csv.pl)
Linux_devices="1500"

### SOLARIS OPTIONS ###

# Change to "1" to activate VxVM volumes IO statistics
Solaris_VxVM="0"

# UARG collection (new in Version 1.11), Change to "0" to deactivate, "1" to activate (default is activate)
Solaris_UARG="1"

### AIX COMMON OPTIONS ###

# Change this line if you add or remove common options for AIX, do not change NFS options here (see NFS options)
# the -p option is mandatory as it is used at launch time to save instance pid
AIX_options="-f -T -A -d -K -L -M -P -^ -p"

Hopefully this looks ok

0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

SplunkTrust
SplunkTrust

Hi, yes that looks good.
Once applied, the best would be killing the current nmon process on those servers, a new nmon process will be spawned using these parameters. (or you can wait about 2 hours max for the current process to finish)
And you should be good

0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

Path Finder

so I let the nmon just run overnight hoping a nmon process would spawn...I'm now getting this error:

the APP directory could not be defined, is TA-nmon / PA-nmon installed ?
0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

SplunkTrust
SplunkTrust

Hi,

Can we exchange by mail please ?

You can contact me through the app main page

0 Karma
Highlighted

Re: NMON Performance Monitor for Unix and Linux Systems: How to troubleshoot why performance data from AIX systems is no longer getting indexed?

SplunkTrust
SplunkTrust

I would need more inputs, this message might come from different places (generating nmon processes, processing nmon files), but I don't see why you would get this.

Changing these nmon parameters using the local/nmon.conf is normally quite simple.

Let's restart from scratch:

  • Take one server, connect to it in terminal
  • Verify if an nmon process is running
  • Try running manually the nmon_helper.sh, very simple:

/opt/splunkforwarder/bin/splunk cmd /opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh

  • This script reads the local/nmon.conf and launches the nmon processes with parameters of interval and snapshot, which can be seen while looking at nmon parameters

All these steps and information are in the trouble shooting guide you read earlier.

I will be happy to exchange with you by mail, which will be easier and faster. (guilhem.marchand@gmail.com)

0 Karma