Getting Data In

What is the recommended threshold for Splunkd process on indexers and syslog forwarders?

hanijamal
New Member

hey guys, what is a good threshold to set for the splunkd process on indexers and syslog forwarders? we are finding that cpu utilization has been slowly creeping up to about 15%.. is this normal? concerning? how can this be reduced?

thanks!

0 Karma
1 Solution

lguinn2
Legend

Well, on the indexers, you should expect CPU to be more utilized as the workload (indexing, searching or replicating) increases.
On the forwarders, workload increases with the amount of data forwarded, but even more with the number of files being monitored.

So for the indexers, I wouldn't worry about it, although you should probably set up the Splunk monitoring console and enable the alerts for high CPU usage, nearing maximum disk, etc. Aso, the latest monitoring console has a cool health check that will give you warnings if a resource appears to be mis-configured.

For the forwarder, 15% CPU utilization would concern me a bit more. First, are you running a heavy forwarder or the universal forwarder? I strongly recommend the universal forwarder if you are forwarding syslog - it is faster and puts less load on the network. Just be sure that you have set limits.conf on the universal forwarder to remove the default bandwidth limits restriction.

Second - your syslog forwarder is probably monitoring one or more directories that contain syslog files. What are you doing with the older syslog files, the ones that are no longer being written? On a syslog server, I would set up a log file rotation that moved log files with a last mod time over 24 hours to another directory (a directory outside the tree that the forwarder is monitoring.)

The forwarder will continue to monitor all files in the directory tree, regardless of their last mod time, because it doesn't know that the file will never be written again. Over time, this increases the forwarder's workload and is quite wasteful. Cleaning out the older log files will remedy this immediately.

Finally, as the forwarder monitors more and more files, it eventually starts peaking the CPU and becomes non-functional. In older versions of Splunk, this could happen after about 7000 files, more or less. New versions of Splunk may be better about this (I don;t have personal recent experience), but eventually you will still hit a limit if you don't manage the log files.

View solution in original post

lguinn2
Legend

Well, on the indexers, you should expect CPU to be more utilized as the workload (indexing, searching or replicating) increases.
On the forwarders, workload increases with the amount of data forwarded, but even more with the number of files being monitored.

So for the indexers, I wouldn't worry about it, although you should probably set up the Splunk monitoring console and enable the alerts for high CPU usage, nearing maximum disk, etc. Aso, the latest monitoring console has a cool health check that will give you warnings if a resource appears to be mis-configured.

For the forwarder, 15% CPU utilization would concern me a bit more. First, are you running a heavy forwarder or the universal forwarder? I strongly recommend the universal forwarder if you are forwarding syslog - it is faster and puts less load on the network. Just be sure that you have set limits.conf on the universal forwarder to remove the default bandwidth limits restriction.

Second - your syslog forwarder is probably monitoring one or more directories that contain syslog files. What are you doing with the older syslog files, the ones that are no longer being written? On a syslog server, I would set up a log file rotation that moved log files with a last mod time over 24 hours to another directory (a directory outside the tree that the forwarder is monitoring.)

The forwarder will continue to monitor all files in the directory tree, regardless of their last mod time, because it doesn't know that the file will never be written again. Over time, this increases the forwarder's workload and is quite wasteful. Cleaning out the older log files will remedy this immediately.

Finally, as the forwarder monitors more and more files, it eventually starts peaking the CPU and becomes non-functional. In older versions of Splunk, this could happen after about 7000 files, more or less. New versions of Splunk may be better about this (I don;t have personal recent experience), but eventually you will still hit a limit if you don't manage the log files.

Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...