Monitoring Splunk

Splunkd performance

responsys_cm
Builder

I have Splunk running on two hosts that were built using a vendor CentOS 6.2 installer. One Splunk instance runs at like 2%. The other at 90%. They share the same fundamental app for reading standard linux (log files and the Unix TA scripts for processes, rpms, ports, audit log, etc.).

The machine having problems generates about 1/3 the linux events that the normal server does. The only input it has that the other doesn't are indexing Nessus plugins and report files. But those don't trigger events all that often.

How do I diagnose this performance issue?

Thx.

Craig

Tags (1)
0 Karma
1 Solution

lguinn2
Legend

You have answered your own question:

I narrowed it down to the input for nessus plugins. That input looks like:
[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules
That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.

Splunk is monitoring every file in this directory tree, even files that are not being updated. This takes a lot of CPU.
Ayn's suggestions are great. You might also want to do this on the machine with troubles:

./splunk list monitor

This will give you a sense of what Splunk is trying to monitor. I have found in the past that Splunk starts to slow somewhere between 5,000 and 10,000 files for a single monitor input. WIth the current release, the number might be higher. But if you can't ls the directory, how do you expect Splunk to do it?

What can you do?

First, monitor more specific directories if possible.

Second, remove older files from the directory tree. It can be as simple as a script that runs once a day and moves the old files from .../nessus/plugins to .../nessus/plugins.archive

Third, add the following to your monitor stanza

ignoreOlderThan = 14d

This tells Splunk to ignore files that have not been modified in the last 14 days. Be careful with this setting.

View solution in original post

lguinn2
Legend

You have answered your own question:

I narrowed it down to the input for nessus plugins. That input looks like:
[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules
That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.

Splunk is monitoring every file in this directory tree, even files that are not being updated. This takes a lot of CPU.
Ayn's suggestions are great. You might also want to do this on the machine with troubles:

./splunk list monitor

This will give you a sense of what Splunk is trying to monitor. I have found in the past that Splunk starts to slow somewhere between 5,000 and 10,000 files for a single monitor input. WIth the current release, the number might be higher. But if you can't ls the directory, how do you expect Splunk to do it?

What can you do?

First, monitor more specific directories if possible.

Second, remove older files from the directory tree. It can be as simple as a script that runs once a day and moves the old files from .../nessus/plugins to .../nessus/plugins.archive

Third, add the following to your monitor stanza

ignoreOlderThan = 14d

This tells Splunk to ignore files that have not been modified in the last 14 days. Be careful with this setting.

Ayn
Legend

You might want to check what status Splunk shows for the affected input, using the script available in http://blogs.splunk.com/2011/01/02/did-i-miss-christmas-2/

0 Karma

responsys_cm
Builder

I should also mention that I've had Splunk monitor Nessus plugins on both Windows 2003 and RHEL 5 without this performance issue.

0 Karma

responsys_cm
Builder

Actually, no... I accidentally updated the answer section rather than the comments. I switched those this morning. Splunk is still running at like 70% CPU all the time even though the directory that contains those 52,000+ files only gets updated nightly (at most).

0 Karma

Ayn
Legend

It seems to me you've answered your own question? 🙂

0 Karma

responsys_cm
Builder

I narrowed it down to the input for nessus plugins. That input looks like:

[monitor:///opt/nessus/lib/nessus/plugins/*.nasl]
crcSalt = <SOURCE>
disabled = 0
followTail = 0
sourcetype = nessus_plugins
index = rules

That directory contains something like 52,000+ .nasl files. I can't even do an ls of the directory because it says the argument list is too long. But if I disable that input, CPU drops to almost nothing.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...