Monitoring saturation of event-processing queues in Heavy Forwarders
I have a distributed environment with multiple indexes, search heads, and a pair of heavy forwarders. But over the last few days, one of my heavy forwarders started to alert a issue. The Monitoring Console's Health Check is warning "Saturation of event-processing queues". Besides that, the heavy forwarders performances have decreased a lot, delaying event delivery and failing scripts execution. splunkd is consuming 100% of its CPU core full time.
Checking docs (Identify and triage indexing performance problems), they suggest to determine queue fill pattern through the Monitoring Console > Indexing > Indexing Performance: Instance. But, seems it applies only to the indexers, not to the heavy forwarder.
Please, how could I discover what is causing such issue? How could I monitor such an issue? How can I see when it starts and how long it takes in order to do a cross with other systems behavior? Is such info available in the Monitoring Console?
Thanks in advance and regards,
Tiago
in Alerts For Splunk Admins I have an alert called IndexerLevel - Indexer Queues May Have Issues (refer to the github location if you don't want to download the app)
The monitoring console covers this under Use the monitoring console to view indexing performance
In terms of finding a cause there are various posts on the answers site, try this google search for a start
You should try to find out the cause obviously, but keep in mind that the default queue sizes is tiny. We ended up with the indexers to have something like at $SPLUNK_HOME/etc/system/local/server.conf
-
[queue=AEQ]
maxSize = 200MB
[queue=parsingQueue]
# Default maxSize = 6MB
maxSize = 3600MB
[queue=indexQueue]
maxSize = 4000MB
[queue=typingQueue]
maxSize = 2100MB
[queue=aggQueue]
# Default maxSize = 1MB
maxSize = 3500MB
This buffer of memory helped us to remain stable during peak usage time.