I am currently trying to recover logs that were not indexed over the Christmas break due to a forwarder going down. Now that we have the forwarder up, we are trying to recover the logs but the ingestion rate is slow. We are currently ingesting around 1 million logs per hour instead of around 10 million. I have tried the troubleshooting in the document in Splunk by reducing the fetch size, interval and other but I don't see any change.
Is there any other way or any other troubleshooting techniques things to use? Thanks in advance for help
@vrmandadi - Were you able to test out hunters' solution? Did it work? If yes, please don't forget to resolve this post by clicking on "Accept". If you still need more help, please provide a comment with some feedback. Thanks!
I think you should first determine where the bottleneck in data ingestion.
In Monitoring Console, go to indexing performance - instance/deployment, and the panels there can give you a good understanding of the indexing performance across all the components in the indexing pipeline set. Median Fill Ratio of Data Processing Queues will be very helpful in determining the bottleneck.
You can also take a closer look at metrics.log, which periodically samples Splunk activity every 30 seconds and reports top 10 items in each category to reveal the whole picture across the toplogy, including forwarding thruput and indexing thruput.
index=_internal source=*metrics.log host=xyz
The log has a variety of inspection information:
* group – indicates the data type: pipeline, queue, thruput, tcpout_connections, udpin_connections, and mpool
* group=pipeline – plots the frequency and the duration of the pipeline process machinery
* group=queue – displays the data to be processed
* current_size can identify which are the bottlenecks