Solved: Splunk Nmon Application - Is there any nice way to...

gjanders · ‎03-23-2017

Overall I would say the nmon application is one of the better or possibly the best performance tools we have used for monitoring Linux and AIX servers.

However we do have a minor issue that the forwarders are consuming a relatively large amount of CPU on some servers (relative as in the server has a very, very tiny amount of CPU and nmon uses a large amount of it), we do have servers that run on 0.1 cores of CPU power and without nmon they appear to run just fine.

After some initial tracing/tracking I was able to determine that I believe it's the nmon2csv script which runs on the univeral forwarder to translate the nmon data into multiple csv files which are then indexed. I'm unsure if it's the perl/python or the actual ingestion of a large number of CSV files but it does require some CPU.
On any server with a reasonable amount of CPU, eg. over 1 core of entitled CPU, then it's no issue, but on the smaller servers it can be using 30-50% of the allocated CPU just to do this.

Determining if it's Splunk ingesting the files or the nmon perl script doesn't help very much here, what I would have liked to do is offset the load so a heavy forwarder can take care of running the nmon2csv script.

My initial idea was to trigger a script to run at the heavy forwarder when it saw the data as per https://answers.splunk.com/answers/114329/transform-log-file-or-field-at-index-time-using-script-pyt... . Since the nmon TA is doing:

invalid_cause = archive
unarchive_cmd = $SPLUNK_HOME/etc/apps/TA-nmon/bin/nmon2csv.sh --mode realtime
sourcetype = nmon_processing

I cannot run this on the remote heavy forwarder.

One option I noticed in the nmon documentation is to use a syslog forwarding solution to get the files over to a remote location, since I'm already running the universal forwarder I'm hoping there is another way to remotely process the files without using rsync or syslog to copy the files around.

Any ideas?

Thanks

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

guilmxm · ‎03-23-2017

Hi !

Thank for your interest in Nmon Performance application, and I am glad you feel it great and useful.

To answer your question, I am very happy to inform you that the CPU overhead consumption will be drastically reduced and constant with the upcoming release 1.3.0.

The new release is currently under testing review for its qualification, the CPU footprint issue has been solved by the implementation of named pipe (fifo files).
Nmon binaries will now write to named pipe instead of regular files, a constant running fifo reader process will retrieve the new data and stream it to nmon2csv parsers.
As the volume of data streamed at each iteration si very small and does not anymore increase over the time, the CPU, I/O and memory cost is minimal and constant.

This new feature and behavior will be available to AIX and Linux, Solaris will be upcoming as well.

The real test currently running have already confirmed the stability and great CPU footprint improvements.

I expect this new release to be available within next weeks, and you are more than welcomed to participate in its validation:

https://github.com/guilhemmarchand/nmon-for-splunk/tree/testing/resources

If you deploy the testing release, you can kill the running nmon process after the upgrade to get immediately the named pipe process to be started.

Besides this, the new release also implements nice new features:

the list of key performance monitors to be parsed is now stores in an external json file, which allows people to customize in an upgrade persistent fashion
the new release implements the Nmon external feature, basically this allows you to extend very easily Nmon data with anything you need and that matters for you (command output, shell / perl / Python script, external API calls... whatever you want)
some minor issue corrections
Availability to generate the performance data in json format instead of legacy csv if you are more interested in saving storage at the indexes level instead of saving licensing cost and best performances. (about 50% more costs in license, 20% less cost in storage)

So, it is just a question of a few weeks before the release will be published 😉

Guilhem Marchand

View solution in original post

guilmxm · ‎03-23-2017

Hi !

Thank for your interest in Nmon Performance application, and I am glad you feel it great and useful.

To answer your question, I am very happy to inform you that the CPU overhead consumption will be drastically reduced and constant with the upcoming release 1.3.0.

The new release is currently under testing review for its qualification, the CPU footprint issue has been solved by the implementation of named pipe (fifo files).
Nmon binaries will now write to named pipe instead of regular files, a constant running fifo reader process will retrieve the new data and stream it to nmon2csv parsers.
As the volume of data streamed at each iteration si very small and does not anymore increase over the time, the CPU, I/O and memory cost is minimal and constant.

This new feature and behavior will be available to AIX and Linux, Solaris will be upcoming as well.

The real test currently running have already confirmed the stability and great CPU footprint improvements.

I expect this new release to be available within next weeks, and you are more than welcomed to participate in its validation:

https://github.com/guilhemmarchand/nmon-for-splunk/tree/testing/resources

If you deploy the testing release, you can kill the running nmon process after the upgrade to get immediately the named pipe process to be started.

Besides this, the new release also implements nice new features:

the list of key performance monitors to be parsed is now stores in an external json file, which allows people to customize in an upgrade persistent fashion
the new release implements the Nmon external feature, basically this allows you to extend very easily Nmon data with anything you need and that matters for you (command output, shell / perl / Python script, external API calls... whatever you want)
some minor issue corrections
Availability to generate the performance data in json format instead of legacy csv if you are more interested in saving storage at the indexes level instead of saving licensing cost and best performances. (about 50% more costs in license, 20% less cost in storage)

So, it is just a question of a few weeks before the release will be published 😉

Guilhem Marchand

gjanders · ‎03-23-2017

I ran some testing and I'm seeing approximately 1/2 the CPU used by the Splunk process on a single AIX machine compared to previously with the new TA-nmon version!

Great work as always!

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

guilmxm · ‎03-23-2017

Thank you 😉

That's great new.
An update has been done tonight to correct the last issues on the new release.

It is very likely to be ready for final qualification. Feel free if you observe any issue.

Regards,

Guilhem

Splunk Nmon Application - Is there any nice way to remove the CPU load from the universal forwarders?

Introducing Splunk Enterprise 9.2

Adoption of RUM and APM at Splunk

Routing logs with Splunk OTel Collector for Kubernetes