I have a Splunk distributed architecture with 5 indexers and 3 search heads in a search head cluster, but currently I am facing some issue.
On the Distributed Management Console, all aspects are looking good and normal except the alert notification below:
DMC Alert - Saturated Event-Processing Queues - One or more of your indexer queues is reporting a fill percentage, averaged over the last 15 minutes, of 90% or more.
with the same time, on SHC web, below messages notifications occurred.
Search peer indexer1 has the following message: Too many streaming errors to target=indexer2. Not rolling hot buckets on further errors to this target. (This condition might exist with other targets too. Please check the logs)
Kindly guide me and let me know what I have to do in this situation..
Unfortunately, I don't have a solution to this issue. I've been having this issue for some time now and I still haven't identified the root-cause or a solution to resolve this. I've already engaged Splunk Support to assist with troubleshooting and have tried different configuration changes -- from increasing the ulimit, to changing the maxKBps throughput, to changing the MAXEVENTS settings. However, the issue persists. The weird thing is that this issue is only happening on half of the indexers (5 newly added indexers.) There are 10 indexers total in our Splunk infrastructure, 5 old indexers and 5 newly added indexers for expansion. The hardware specs of the servers are almost identical. But for some reason the "Saturated Event-Processing Queues" only happen on the 5 new indexers. Whenever this happens the affected indexers are still searcheable but the indexing stops and the load is distributed to the rest of the healthy indexers.
For now, the band-aid approach is to restart the new indexers whenever the alert is triggered. This has become an annoying and painful daily process. I'd really appreciate it if someone out there have encountered this issue, successfully identified the root-cause and resolved this issue.
Thank you very much.
Any further update on this issue? We are seeing the same pattern with 6.5.3
I've found useful information in the following articles: