The universal forwarder in our setup is forwarding data to a single indexer and the max KBps is set to the default value of 256KBps in the limits.conf.
I have generated a graph with the following search:
index=_internal group="tcpin_connections" | timechart span=30m avg(tcp_KBps)
which shows Average TCP KB per second per forwarder.
The graph shows the same pattern every day. The average KBps gradually reaches anywhere between 430KBps to 660KBps and then suddenly drops to 250KBps and stays at this for several hours and then gradually drops. This is repeated every day.
During these times there is a delay in the log files appearing in Splunk which means are alerting and reports are off. This delay happens the whole time the KBps rate is at 250KBps and the delay only stops after the KBps starts to drop (when the amount of events per second in the logs drop).
So going back to my original question I want to know why the KBps can go as high 660KBps (but not every time most of the time lower) and then suddenly drops.
Does Splunk allow for bursts of data for a short period of time and then start to throttle to the max KBps set in the limits.conf once the KBps has been too high for too long.
Yes, the process for measuring Splunk Enterprise forwarding thruput is a send > check numbers > limit > repeat. Obviously I'm not a dev! This results in spikes of data. More detail than that on the process would have to come through support. In any case, reducing the persistent queue on the outputs.conf to something much smaller (~128k) was offered as a way to flatten the peaks, as there was less data queued to send. Play with that on a select group of forwarders and check the results.
The persistent queue explains why the logs experience delay because the logs are being queued and it takes several hours for the queue to processed and for the lie data to be streamed again.
What it doesn't explain is why the KBps of the forwarder goes up to around 4000KBps to 600KBps for a few hours and then drops to 250KBps.