I have 2 servers configured with NMon-TA that are sending data back to my free Splunk instance. Most days it doesn't send too much data (maybe 75% of my limit), but seemingly once a week it will send 150% of my limit. My Splunk instance won't let me search anymore.
Sorry to be short on information, but I don't know what I should include. What kind of information do I need to look at to figure out why my nmon is sending so much data back to my server?
Hi,
Can you give a try to this version of the script:
https://github.com/guilhemmarchand/nmon-for-splunk/blob/testing/resources/TA-nmon/bin/nmon_helper.sh
Replace your current version with this one (ensure it is executable), stop the UF, kill running instances and start the UF.
Upon a few minutes you should only have one running nmon instance.
Please let me know
Hi,
A new version as been published with the release 1.5.25, this brings a better identification of App related nmon instances under Linux (between other things changes you've tested with the testing version of the script)
This also embeds a new Alerting template that will inform you if that thing happens again (it analyses the time between nmon instances launches)
Note that upgrade may temporarily generate 2 running nmon instances for a short moment as it brings new per OS version nmon binaries (if your host don't have nmon in the path and the App uses embedded binaries)
You may simply stop these processes, or wait for them to end.
Guilhem
Thanks very much for the quick update! I will upgrade both my Splunk app and TA app for forwarder this evening (US).
I have updated and everything is working without issue.
You're welcome, feel free to informing me in case of trouble
Guilhem
Hi,
Can you give a try to this version of the script:
https://github.com/guilhemmarchand/nmon-for-splunk/blob/testing/resources/TA-nmon/bin/nmon_helper.sh
Replace your current version with this one (ensure it is executable), stop the UF, kill running instances and start the UF.
Upon a few minutes you should only have one running nmon instance.
Please let me know
This script is working. It is both adding the PID to the file and also not starting duplicate instances.
One question: Previously I was not using TA-nmon on this machine. I was using the local script in the nmon app itself. Is this script somehow different from the nmon_helper.sh that comes with the non-TA app?
I am going to let this run for a day and I will report back on my license usage. The script has only been running for a few minutes.
Yes it the same, i think you add some trouble with the previous version which was preventing it from identifying nmon processes.
I will release soon a new version including this update.
It seems to be working just fine now. It has never used this little data before. I have used 11MB today (it is 9AM in my server location) for nmon. Vastly different from the 300-400MB it was using per day.
Thank you very much!
I tried to touch the PID file, but every time that nmon starts (again, every minute), it deletes the PID file and does not replace it.
Right, something bad is happening with nmon process identification when the nmon_helper.sh scripts runs.
Can we exchange by mail ? (you can find my mail in top of the help page of the App, from the home page of the App in Splunk, the marker icon)
Edit the script (nmon_helper.sh), comment out the "# set -x", and start it manually:
/opt/splunkforwarder/bin/splunk cmd /opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh
I have 2 servers: cleteNAS (TA-nmon) and blackwellServer (local collection). I think the issue is on the server named blackwellServer.
Each server is very small (they are both mini-ITX AMD APUs with 3x2TB drives in RAID-5 and run apache/Splunk/some other stuff).
I have your latest version of nmon as of today.
In the logs I see that it is removing stale pid files every minute.
INFO: starting nmon : /usr/bin/nmon -f -T -d 1500 -s 60 -c 120 in /opt/splunk/var/run/nmon/var/nmon_repository
INFO: Removing staled pid file
I also see the processes stacking up:
root 17880 0.0 0.0 15432 1100 ? S 21:55 0:00 /usr/bin/nmon -f -T -d 1500 -s 60 -c 120
root 18396 0.0 0.0 15436 1100 ? S 21:56 0:00 /usr/bin/nmon -f -T -d 1500 -s 60 -c 120
root 18822 0.1 0.0 15280 964 ? S 21:57 0:00 /usr/bin/nmon -f -T -d 1500 -s 60 -c 120
The files in the nmon_repository directory that align to these still-running instances have constantly updating timestamps, which is probably causing Splunk to pull them in multiple times.
Indeed, I have modified your nmon_processing search. In the past 10 minutes, here are how many records were processed by host:
host count percent
blackwellServer 49 84.482759
cleteNAS 9 15.517241
I am not sure why the nmon processes are stacking up and the PID file is not being created.
Thanks for your detailed answer. I'm afraid I can't search right now due to license violations. I think I'm going to have to start over with my instance or wait 30 days. When I get it working next I'll run those searches and report back.
Hello !
I'm the author of the Nmon Splunk app, thank you for using it 🙂
There is must something bad, even 75% of 500 mb per day is very high moreover fro only 2 servers, unless you have very very big servers like Solaris domain zones or big AIX partitions... what i guess you don't... that should something less than 20-30 mb a day
What is the version of the App you are using ?
Can you verify that is there only one nmon process running at a time on the TA ? (multiple processes would result in dupplicating data)
The following sourcetype will contain important information too:
index=nmon sourcetype=nmon_collect
--> nmon_helper.sh execution output
index=nmon sourcetype=nmon_processing
--> nmon raw data conversion
Also, you can issue some event count to understand what's going on, some search like:
index=nmon earliest=-7d latest=now | timechart span=1d count by sourcetype
--> will count the number of events per sourcetype, 99% of data should be in nmon_data
index=nmon sourcetype=nmon_data earliest=-7d latest=now | timechart span=1d useother=f limit=0 count by type
--> will count the number of events per perf monitor
A simple perf monitor like CPU_ALL (cpu % usage, 1 event per update) should around 1 event per minute, so around 1400 events per day (there may be some gaps)
index=nmon_performance sourcetype=nmon_data type=CPU_ALL host=myhost | timechart count span=1d by sourcetype
I think your TA is generating too much data, for some reasons, probably dupplicating nmon instances ?