If you want to track the size of the archive processing queue, you could run the following search :
index=_internal host=host_reading_archives source=*metrics.log group=queue name=aq earliest=-24h | timechart span=15mn perc95(current_size), max(max_size)
Feel free to change the values of "earliest" and "span" to better fulfill your needs.
Note that we are asking for the 95th percentile of the "current_size" field and the maximum value of the "max_size" field to produce a good statistical representation of what's going on.
As for the remediation to the blockage of the archive processing queue, the answer and comments from Stephen Sorkin in the post you refer to are still valid. I would add a few :
If at all possible, feed the files to Splunk when uncompressed. Uncompressed files can be processed in parallel where archives have to be processed serially.
Find out if there other blocked queues downstream which might be the actual culprit by propagating the clogging upstream. You can do this by running a search like
index=_internal host=host_reading_archives source=*metrics.log group=queue | timechart span=15mn perc95(current_size) by name. Change span or the value of "host" as needed, depending on where (forwarder reading the archives? indexer committing the events to disk?) and when you want to look at the size of the event processing queues.
Optimize event-processing for the sourcetypes indexed from the archive files by declaring explicit line-breaking rules (use LINE_BREAKER in props.conf) and time-stamp extractions (use TIME_FORMAT, TIME_PREFIX and MAX_TIMESTAMP_LOOKAHEAD in props.conf). See props.conf.spec - http://www.splunk.com/base/Documentation/latest/Admin/Propsconf - for details.
If your architecture allows it, split the task of reading archive files between several Splunk forwarders.
Thanks. What does the size mean? Is there a good way to interpret if it's become to big? I enabled my input and the AQ queue jumped to over a 1000, while it's normally at 0. I have since disabled it again
To paraphrase Stephen Sorkin, seeing this queue at 1000 "means that the file processing code has found more than 1000 archive files that we are processing in turn." I would imagine that this is expected in your case.