What is causing the following warning from the Mon...

tcmarquesi · ‎10-31-2018

Monitoring saturation of event-processing queues in Heavy Forwarders

I have a distributed environment with multiple indexes, search heads, and a pair of heavy forwarders. But over the last few days, one of my heavy forwarders started to alert a issue. The Monitoring Console's Health Check is warning "Saturation of event-processing queues". Besides that, the heavy forwarders performances have decreased a lot, delaying event delivery and failing scripts execution. splunkd is consuming 100% of its CPU core full time.

Checking docs (Identify and triage indexing performance problems), they suggest to determine queue fill pattern through the Monitoring Console > Indexing > Indexing Performance: Instance. But, seems it applies only to the indexers, not to the heavy forwarder.

Please, how could I discover what is causing such issue? How could I monitor such an issue? How can I see when it starts and how long it takes in order to do a cross with other systems behavior? Is such info available in the Monitoring Console?

Thanks in advance and regards,

Tiago

gjanders · ‎10-31-2018

in Alerts For Splunk Admins I have an alert called IndexerLevel - Indexer Queues May Have Issues (refer to the github location if you don't want to download the app)

The monitoring console covers this under Use the monitoring console to view indexing performance

In terms of finding a cause there are various posts on the answers site, try this google search for a start

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

ddrillic · ‎10-31-2018

You should try to find out the cause obviously, but keep in mind that the default queue sizes is tiny. We ended up with the indexers to have something like at $SPLUNK_HOME/etc/system/local/server.conf -

[queue=AEQ]
maxSize = 200MB

[queue=parsingQueue]
# Default maxSize = 6MB
maxSize = 3600MB

[queue=indexQueue]
maxSize = 4000MB

[queue=typingQueue]
maxSize = 2100MB

[queue=aggQueue]
# Default maxSize = 1MB
maxSize = 3500MB

This buffer of memory helped us to remain stable during peak usage time.

What is causing the following warning from the Monitoring Console's Health Check?: "Saturation of event-processing queues"

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

New Release of Federated Search: Bringing Splunk Analytics to More of Your Data

Inside Event Intelligence: How ITSI Turns Network Alerts into Actionable Incidents

Observability Simplified: Combining User Experience, Application Performance & ...

Join the Conversation