One out of the eight indexer has the two queues filled up for a couple of hours - parsing and aggregation queues. What can be done besides waiting for them to clear? I believe we use the default queue sizes, which are relatively small...
index=_internal host=<indexer name> "ERROR" sourcetype=splunkd doesn't show much besides communication errors.
@danielbb You can go to Monitoring Console -> Indexing -> Input -> Data Quality and look for any parsing or aggregation issues (line breaks , timestamps etc.) and if they are more in number you can try fixing that source type, it should help reducing your parsing and aggregation queues.
probably those 2 indexers receiving more data from a particular host most of the time that has parsing or aggregation issues. If all the indexers and configurations are identical, then ideally it should have been same. Do all indexers have similar specifications?
If you got on the monitoring console -> Indexing-> Input-> Data Quality you can see the list of source type
Sourcetype Total Issues Host Count Source Count Line Breaking Issues Timestamp Parsing Issues Aggregation Issues.
You can click a row with highest count in Line Breaking Issues and you will get the detailed information in logs, similarly you can click on timestamp parsing issues count.
If parsing and aggregation queue blocks for longer time then it will start blocking your splunktcpin and tcpin queues and if you are not using
useACK on Forwarder then there might be possibility that you'll lose data over network.
In your case it looks like due to Aggregation Queue full & back pressure parsing queue is also getting full. Have a look at https://wiki.splunk.com/Community:HowIndexingWorks , in aggregation queue Line Merging and Timestamp parsing happens. I'll suggest to define
TIME_FORMAT for as much as log you can so that splunk will parse time stamp quickly.
Also can you please let us know whether typing queue and indexing queue was also full at same time on that indexer ?
Try below query to find out time parsing issue.
index=_internal host=<your Indexer> source="/opt/splunk/var/log/splunk/splunkd.log" component=DateParserVerbose
Even if you don't have timestamp parsing issue, I'll suggest to configure TIME_FORMAT for sources which are ingesting more data so that splunk do not need to find different timeformat in your log.