Getting Data In

What is the recommended threshold for Splunkd process on indexers and syslog forwarders?

hanijamal
New Member

hey guys, what is a good threshold to set for the splunkd process on indexers and syslog forwarders? we are finding that cpu utilization has been slowly creeping up to about 15%.. is this normal? concerning? how can this be reduced?

thanks!

0 Karma
1 Solution

lguinn2
Legend

Well, on the indexers, you should expect CPU to be more utilized as the workload (indexing, searching or replicating) increases.
On the forwarders, workload increases with the amount of data forwarded, but even more with the number of files being monitored.

So for the indexers, I wouldn't worry about it, although you should probably set up the Splunk monitoring console and enable the alerts for high CPU usage, nearing maximum disk, etc. Aso, the latest monitoring console has a cool health check that will give you warnings if a resource appears to be mis-configured.

For the forwarder, 15% CPU utilization would concern me a bit more. First, are you running a heavy forwarder or the universal forwarder? I strongly recommend the universal forwarder if you are forwarding syslog - it is faster and puts less load on the network. Just be sure that you have set limits.conf on the universal forwarder to remove the default bandwidth limits restriction.

Second - your syslog forwarder is probably monitoring one or more directories that contain syslog files. What are you doing with the older syslog files, the ones that are no longer being written? On a syslog server, I would set up a log file rotation that moved log files with a last mod time over 24 hours to another directory (a directory outside the tree that the forwarder is monitoring.)

The forwarder will continue to monitor all files in the directory tree, regardless of their last mod time, because it doesn't know that the file will never be written again. Over time, this increases the forwarder's workload and is quite wasteful. Cleaning out the older log files will remedy this immediately.

Finally, as the forwarder monitors more and more files, it eventually starts peaking the CPU and becomes non-functional. In older versions of Splunk, this could happen after about 7000 files, more or less. New versions of Splunk may be better about this (I don;t have personal recent experience), but eventually you will still hit a limit if you don't manage the log files.

View solution in original post

lguinn2
Legend

Well, on the indexers, you should expect CPU to be more utilized as the workload (indexing, searching or replicating) increases.
On the forwarders, workload increases with the amount of data forwarded, but even more with the number of files being monitored.

So for the indexers, I wouldn't worry about it, although you should probably set up the Splunk monitoring console and enable the alerts for high CPU usage, nearing maximum disk, etc. Aso, the latest monitoring console has a cool health check that will give you warnings if a resource appears to be mis-configured.

For the forwarder, 15% CPU utilization would concern me a bit more. First, are you running a heavy forwarder or the universal forwarder? I strongly recommend the universal forwarder if you are forwarding syslog - it is faster and puts less load on the network. Just be sure that you have set limits.conf on the universal forwarder to remove the default bandwidth limits restriction.

Second - your syslog forwarder is probably monitoring one or more directories that contain syslog files. What are you doing with the older syslog files, the ones that are no longer being written? On a syslog server, I would set up a log file rotation that moved log files with a last mod time over 24 hours to another directory (a directory outside the tree that the forwarder is monitoring.)

The forwarder will continue to monitor all files in the directory tree, regardless of their last mod time, because it doesn't know that the file will never be written again. Over time, this increases the forwarder's workload and is quite wasteful. Cleaning out the older log files will remedy this immediately.

Finally, as the forwarder monitors more and more files, it eventually starts peaking the CPU and becomes non-functional. In older versions of Splunk, this could happen after about 7000 files, more or less. New versions of Splunk may be better about this (I don;t have personal recent experience), but eventually you will still hit a limit if you don't manage the log files.

Get Updates on the Splunk Community!

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Easily Improve Agent Saturation with the Splunk Add-on for OpenTelemetry Collector

Agent Saturation What and Whys In application performance monitoring, saturation is defined as the total load ...