Monitoring Splunk
Highlighted

Splunkd performance

Builder

I have Splunk running on two hosts that were built using a vendor CentOS 6.2 installer. One Splunk instance runs at like 2%. The other at 90%. They share the same fundamental app for reading standard linux (log files and the Unix TA scripts for processes, rpms, ports, audit log, etc.).

The machine having problems generates about 1/3 the linux events that the normal server does. The only input it has that the other doesn't are indexing Nessus plugins and report files. But those don't trigger events all that often.

How do I diagnose this performance issue?

Thx.

Craig

Tags (1)
0 Karma
Highlighted

Re: Splunkd performance

Builder

I narrowed it down to the input for nessus plugins. That input looks like:

[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules

That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.

0 Karma
Highlighted

Re: Splunkd performance

Legend

It seems to me you've answered your own question? 🙂

0 Karma
Highlighted

Re: Splunkd performance

Builder

Actually, no... I accidentally updated the answer section rather than the comments. I switched those this morning. Splunk is still running at like 70% CPU all the time even though the directory that contains those 52,000+ files only gets updated nightly (at most).

0 Karma
Highlighted

Re: Splunkd performance

Builder

I should also mention that I've had Splunk monitor Nessus plugins on both Windows 2003 and RHEL 5 without this performance issue.

0 Karma
Highlighted

Re: Splunkd performance

Legend

You might want to check what status Splunk shows for the affected input, using the script available in http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/

0 Karma
Highlighted

Re: Splunkd performance

Legend

You have answered your own question:

I narrowed it down to the input for nessus plugins. That input looks like:
[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules
That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.

Splunk is monitoring every file in this directory tree, even files that are not being updated. This takes a lot of CPU.
Ayn's suggestions are great. You might also want to do this on the machine with troubles:

./splunk list monitor

This will give you a sense of what Splunk is trying to monitor. I have found in the past that Splunk starts to slow somewhere between 5,000 and 10,000 files for a single monitor input. WIth the current release, the number might be higher. But if you can't ls the directory, how do you expect Splunk to do it?

What can you do?

First, monitor more specific directories if possible.

Second, remove older files from the directory tree. It can be as simple as a script that runs once a day and moves the old files from .../nessus/plugins to .../nessus/plugins.archive

Third, add the following to your monitor stanza

ignoreOlderThan = 14d

This tells Splunk to ignore files that have not been modified in the last 14 days. Be careful with this setting.

View solution in original post