Problem:
Indexing throughput drops linearly when new data sources/forwarders/apps are added.
Indexing throughput drops linearly when the tuple( combination of the cross product of source, sourcetype and host) increases( Anything > 10k).
Run following search to find if you see channel explosion
index=_internal source=*metrics.log new_channels | timechart max(new_channels)
Each tuple can generate several pipeline input channels. Channel churn puts significant pause in the ingestion pipeline where managing these channels takes significantly long time for pipeline processors and thus ingest less data.
Solutions
For HEC INPUTS : Increase following two on IDX ( or which ever layer the explosion is). Generally these values must be > 2 times max(new_channels)
[input_channels]
max_inactive =
* Internal setting, do not change unless instructed to do so by Splunk
Support.
lowater_inactive =
* Internal setting, do not change unless instructed to do so by Splunk
Support.
For S2S(UF/HF) INPUTS : Increase following on IDX ( or which ever layer the explosion is)
max_inactive =
* Internal setting, do not change unless instructed to do so by Splunk Support.
On Forwarding side(all UF/HFs) increase
autoLBFrequency upto 180 sec.
over splunk ver 8:
| tstats max(PREFIX("new_channels=")) where index=_internal source=*metrics.log by _time