We have a support ticket open, but I thought I'd also ask the community. Since upgrading our Splunk to 8.0.1 this one HF has been spewing "TcpOutputProc - Possible duplication of events " for most channels. As well as "TcpOutputProc - Applying quarantine to ip=xx.xx.xx.xx port=9998 _numberOfFailures=2"
We upgraded on the 15th near midnight. This is a count of those the errors from that host.
Here is a count from the indexer cluster showing the number of blocked=true events. One would expect these to be similar in count if the indexers were telling the HF to go elsewhere because it's queues were full.
The HF is still "sick" but here are some things we did that seemed to help.
Edited the outputs.conf that this HF used to output to forwarders within it's own site.
Removed useACK=true from outputs.conf
I'm a little concerned about #2 there. We could still be having issues with the outputs, only now the events are being dropped on the floor. In other words the condition may still be present, we have simply turned off the logging by removing useAck.