We've noticed an issue with our upgrade after upgrading Splunk from 7.3.2 to version 8.0.5. We're on a cluster environment, with 3 indexers and 3 SHs. We're forcing python 3.7 on all of the Splunk servers.
Since the upgrade, all 3 indexer Indexing queues have been full, as you can see in the screenshot below.
There have been no changes to the amount of data we're ingesting since the upgrade, however a few of the apps did need to also be upgraded to be python 3.7 compatible.
Here is what we've tried:
Restarting - Alleviates the queues for a little bit, but inevitably gets blocked
Increasing the queue sizes - We've increased the queue sizes from the default to 80mb, and this increased the time until the queues were blocked. Noticeably, one indexer would block first, then the others would get blocked after some more minutes
Validated all the permissions and ownership
There's been two things of note that could be related to this issue:
This graph shows that the indexer pipe is directly correlating to the FwdDataReceiverThread. Unfortunately, doesn't seem to be much info concerning this thread out there. We've noticed that we've been getting the following errors concerning this thread.
ERROR Watchdog - No response received from IMonitoredThread=0x7fb47f7feb50 within 8000 ms. Looks like thread name='FwdDataReceiverThread' tid=6894 is busy !? Starting to trace with 8000 ms interval.
There have also been a number of crashlogs since the upgrade on the Indexers. These crashlogs include items like the following:
It seems to be related to a particular search, so not sure if this is related to the issue.