We have the following search -
base search | eval diff= _indextime - _time | eval capturetime=strftime(_time,"%Y-%m-%d %H:%M:%S") | eval indextime=strftime(_indextime,"%Y-%m-%d %H:%M:%S") | table capturetime indextime diff
We see the following -
So, we see a delay of over five hours in indexing. Is there a way to find out where these events "got stuck"? In this case, these events are coming from hadoop servers and the forwarder processes around 1/2 million files. We would like to know whether the delay is at the forwarder level or on the indexer side.
ok, I see -
$ find . -name "limits.conf" | xargs grep -i maxKBps ./etc/apps/universal_config_forwarder/local/limits.conf:maxKBps = 0 ./etc/apps/SplunkUniversalForwarder/default/limits.conf:maxKBps = 256 ./etc/system/default/limits.conf:maxKBps = 0
and then -
$ ./splunk btool --debug limits list | grep maxKBp /opt/splunk/splunkforwarder/etc/apps/universal_config_forwarder/local/limits.conf maxKBps = 0
I would run a btool command to check which setting is applied. (system/default has lowest priority).
bin/splunk btool limits list --debug | grep maxKBps
I was late/early on that. Check the various queue sizes if there is any high spikes on the queue sizes.
index=_internal sourcetype=splunkd source=*metrics.log group=queue | timechart avg(current_size) by name
You can add host=yourUFName to see queue sizes on UF and host=Indexer (add more OR condition for all indexers) to see queue sizes on Indexers. You may need to adjust queue sizes based on results from there. https://answers.splunk.com/answers/38218/universal-forwarder-parsingqueue-kb-size.html
The aggQueue is where date parsing and line merging happens. This suggest that there may be in-efficient event parsing configuration setup. What is the sourcetype definition (props.conf on indexers) you've for sourcetypes involved?