With about 200 Heavy Forwarder sending data to four indexers.
All Heavy Forwarders are Splunk Version 5.0.2 and Indexer’s on 6.0.
For some of the Heavy Forwarder’s, consistently queues are blocked causing extreme latency in forwarding data to indexer, The tcpout queue remains blocked, result in blocking of other queues. The indexers have no backlog at the time.
The outgoing throughput from HF is around 512KB/s .
Verified that through se to unlimited
As far as solution goes, the simple recommendation will be to convert Heavy Forwarder to Universal Forwarder, as through put on the Universal Forwarder is much better.
In this case number of full forwarders seem to be around ~200. It is possible due to nature of handing of fully cooked splunk-to-splunk data, i.e receiving data and parsing s2s cooked data in same thread, is leading to slow thruput numbers. Imbalance of 200HWF-->4 IDX is likely causing slowdown. Typically 200UF ---> 4 IDX won't be an issue.
In case transition from Heavy forwarder to Universal Forwarder is not an option, We would think that thruput numbers will be improved by adding additional indexers instances on the same physical box(assuming that existing instances are barely doing any work). This will just allow more indexers to process splunk-to-splunk data(unfortunately there is just one thread doing splunk-to-splunk deserialization work).
In addition, recommendation will be to upgrade all the Heavy Forwader to be on version 5.0.4 and above as some of the critical bugs were fixed post 5.0.4.