Solved: What is the recommended threshold for Splunkd proc...

hanijamal · ‎01-06-2017

hey guys, what is a good threshold to set for the splunkd process on indexers and syslog forwarders? we are finding that cpu utilization has been slowly creeping up to about 15%.. is this normal? concerning? how can this be reduced?

thanks!

lguinn2 · ‎01-06-2017

Well, on the indexers, you should expect CPU to be more utilized as the workload (indexing, searching or replicating) increases.
On the forwarders, workload increases with the amount of data forwarded, but even more with the number of files being monitored.

So for the indexers, I wouldn't worry about it, although you should probably set up the Splunk monitoring console and enable the alerts for high CPU usage, nearing maximum disk, etc. Aso, the latest monitoring console has a cool health check that will give you warnings if a resource appears to be mis-configured.

For the forwarder, 15% CPU utilization would concern me a bit more. First, are you running a heavy forwarder or the universal forwarder? I strongly recommend the universal forwarder if you are forwarding syslog - it is faster and puts less load on the network. Just be sure that you have set limits.conf on the universal forwarder to remove the default bandwidth limits restriction.

Second - your syslog forwarder is probably monitoring one or more directories that contain syslog files. What are you doing with the older syslog files, the ones that are no longer being written? On a syslog server, I would set up a log file rotation that moved log files with a last mod time over 24 hours to another directory (a directory outside the tree that the forwarder is monitoring.)

The forwarder will continue to monitor all files in the directory tree, regardless of their last mod time, because it doesn't know that the file will never be written again. Over time, this increases the forwarder's workload and is quite wasteful. Cleaning out the older log files will remedy this immediately.

Finally, as the forwarder monitors more and more files, it eventually starts peaking the CPU and becomes non-functional. In older versions of Splunk, this could happen after about 7000 files, more or less. New versions of Splunk may be better about this (I don;t have personal recent experience), but eventually you will still hit a limit if you don't manage the log files.

View solution in original post

lguinn2 · ‎01-06-2017

Well, on the indexers, you should expect CPU to be more utilized as the workload (indexing, searching or replicating) increases.
On the forwarders, workload increases with the amount of data forwarded, but even more with the number of files being monitored.

So for the indexers, I wouldn't worry about it, although you should probably set up the Splunk monitoring console and enable the alerts for high CPU usage, nearing maximum disk, etc. Aso, the latest monitoring console has a cool health check that will give you warnings if a resource appears to be mis-configured.

For the forwarder, 15% CPU utilization would concern me a bit more. First, are you running a heavy forwarder or the universal forwarder? I strongly recommend the universal forwarder if you are forwarding syslog - it is faster and puts less load on the network. Just be sure that you have set limits.conf on the universal forwarder to remove the default bandwidth limits restriction.

Second - your syslog forwarder is probably monitoring one or more directories that contain syslog files. What are you doing with the older syslog files, the ones that are no longer being written? On a syslog server, I would set up a log file rotation that moved log files with a last mod time over 24 hours to another directory (a directory outside the tree that the forwarder is monitoring.)

The forwarder will continue to monitor all files in the directory tree, regardless of their last mod time, because it doesn't know that the file will never be written again. Over time, this increases the forwarder's workload and is quite wasteful. Cleaning out the older log files will remedy this immediately.

Finally, as the forwarder monitors more and more files, it eventually starts peaking the CPU and becomes non-functional. In older versions of Splunk, this could happen after about 7000 files, more or less. New versions of Splunk may be better about this (I don;t have personal recent experience), but eventually you will still hit a limit if you don't manage the log files.

What is the recommended threshold for Splunkd process on indexers and syslog forwarders?

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Introducing ITSI 5.0: Unified Visibility and Actionable Insights

Inside Splunk Agent Observability: Understanding Agent Behavior, Tokens & Costs

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Join the Conversation