Monitoring saturation of event-processing queues in Heavy Forwarders
I have a distributed environment with multiple indexes, search heads, and a pair of heavy forwarders. But over the last few days, one of my heavy forwarders started to alert a issue. The Monitoring Console's Health Check is warning "Saturation of event-processing queues". Besides that, the heavy forwarders performances have decreased a lot, delaying event delivery and failing scripts execution. splunkd is consuming 100% of its CPU core full time.
Checking docs (Identify and triage indexing performance problems), they suggest to determine queue fill pattern through the Monitoring Console > Indexing > Indexing Performance: Instance. But, seems it applies only to the indexers, not to the heavy forwarder.
Please, how could I discover what is causing such issue? How could I monitor such an issue? How can I see when it starts and how long it takes in order to do a cross with other systems behavior? Is such info available in the Monitoring Console?
You should try to find out the cause obviously, but keep in mind that the default queue sizes is tiny. We ended up with the indexers to have something like at $SPLUNK_HOME/etc/system/local/server.conf -