I have Splunk running on two hosts that were built using a vendor CentOS 6.2 installer. One Splunk instance runs at like 2%. The other at 90%. They share the same fundamental app for reading standard linux (log files and the Unix TA scripts for processes, rpms, ports, audit log, etc.).
The machine having problems generates about 1/3 the linux events that the normal server does. The only input it has that the other doesn't are indexing Nessus plugins and report files. But those don't trigger events all that often.
How do I diagnose this performance issue?
Thx.
Craig
You have answered your own question:
I narrowed it down to the input for nessus plugins. That input looks like:
[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules
That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.
Splunk is monitoring every file in this directory tree, even files that are not being updated. This takes a lot of CPU.
Ayn's suggestions are great. You might also want to do this on the machine with troubles:
./splunk list monitor
This will give you a sense of what Splunk is trying to monitor. I have found in the past that Splunk starts to slow somewhere between 5,000 and 10,000 files for a single monitor input. WIth the current release, the number might be higher. But if you can't ls
the directory, how do you expect Splunk to do it?
What can you do?
First, monitor more specific directories if possible.
Second, remove older files from the directory tree. It can be as simple as a script that runs once a day and moves the old files from .../nessus/plugins to .../nessus/plugins.archive
Third, add the following to your monitor stanza
ignoreOlderThan = 14d
This tells Splunk to ignore files that have not been modified in the last 14 days. Be careful with this setting.
You have answered your own question:
I narrowed it down to the input for nessus plugins. That input looks like:
[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules
That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.
Splunk is monitoring every file in this directory tree, even files that are not being updated. This takes a lot of CPU.
Ayn's suggestions are great. You might also want to do this on the machine with troubles:
./splunk list monitor
This will give you a sense of what Splunk is trying to monitor. I have found in the past that Splunk starts to slow somewhere between 5,000 and 10,000 files for a single monitor input. WIth the current release, the number might be higher. But if you can't ls
the directory, how do you expect Splunk to do it?
What can you do?
First, monitor more specific directories if possible.
Second, remove older files from the directory tree. It can be as simple as a script that runs once a day and moves the old files from .../nessus/plugins to .../nessus/plugins.archive
Third, add the following to your monitor stanza
ignoreOlderThan = 14d
This tells Splunk to ignore files that have not been modified in the last 14 days. Be careful with this setting.
You might want to check what status Splunk shows for the affected input, using the script available in http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/
I should also mention that I've had Splunk monitor Nessus plugins on both Windows 2003 and RHEL 5 without this performance issue.
Actually, no... I accidentally updated the answer section rather than the comments. I switched those this morning. Splunk is still running at like 70% CPU all the time even though the directory that contains those 52,000+ files only gets updated nightly (at most).
It seems to me you've answered your own question? 🙂
I narrowed it down to the input for nessus plugins. That input looks like:
[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules
That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.