If your forwarders are struggling to send data in a timely manner and if it just started to happen around the times of below you can look into it further,
i) New inputs added
- Have you recently added new inputs that could have overloaded the indexers?
If this is the case, try to disable them to see if it improves the situation and then look into it further why they caused this.
Do you find many log messages from the category, DateParserVerbose or LineBreakingProcessor?
If you find these log messages complaining about invalid timestamp or linebreaks then it is something to do with the input and props configurations causing Splunk to struggle to process the data. You will need to correct the config for the inputs first.
http://docs.splunk.com/Documentation/Splunk/latest/Data/Configuretimestamprecognition
At any moment if you see the messages from these categories it means wrong configs make it hard for Splunk to process data appropriately.
ii) New searches, check any expensive searches running around the time of the messages being put into splunkd.log.
If you see this issue soon after adding new searches, either by new apps or by any splunk users, try to find any expensive searches from Monitoring Console. This could happen when theses searches block indexers from accessing disk in a timely manner.
Or also you need to make sure the performance of disk subsystem, check IOPS to see if it meets the recommended performance;
http://docs.splunk.com/Documentation/Splunk/6.0/Installation/Referencehardware
iii) Slow in processing index data
This is similar to what I mentioned above. You also need to check the indexer queue status to see where the queue blocking started from.
- If it's from indexing queue it could be due to the load on disk
- If it's from typing queue it could be due to some expensive regex issues
- If it's from aggQueue then it could be due to time stamp recognition or line breaking issue for some inputs
iv) Have you had this kind of situation for a long time?
- Then it would have been caused by the mix of above..
Please check the above and if it still persists or you think you have failed to locate the cause please contact Splunk Support and provide the below;
- Splunk Deployment architecture
- Diags from your indexers
- Time of incident so that I know where to look into in log files.
... View more