Re: What is the queue named "aeq" and how to incre...

dglinder · ‎08-08-2014

We have an older RedHat 5.6 box running the Splunk Universal Forwarder 5.0.2 processing a few directories with many *.gz files. The system seems to be keeping up well enough, but we've noticed the metrics.log has started noting a lot of "blocked=true" showing up, mainly from the "aeq" queue. Here's a sample:

[root@linux1621 splunk]# grep "name=aeq" metrics.log | tail
08-08-2014 20:01:03.681 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=63, smallest_size=0
08-08-2014 20:01:34.683 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=0
08-08-2014 20:02:05.562 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=5
08-08-2014 20:02:36.564 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=22
08-08-2014 20:03:07.565 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=482, current_size=15, largest_size=61, smallest_size=0
08-08-2014 20:03:38.564 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=482, current_size=15, largest_size=15, smallest_size=15
08-08-2014 20:04:09.402 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=61, smallest_size=0
08-08-2014 20:04:40.403 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=0
08-08-2014 20:05:11.403 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=1
08-08-2014 20:05:42.404 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=7

These seem to correlate with errors seen in the splunkd.log file:

08-08-2014 20:05:42.836 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:43.044 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:43.708 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:44.057 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:44.394 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:45.363 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:46.339 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:47.939 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:48.251 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:48.459 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...

I tried increasing the maxKBps in limits.conf (doubled it from 1024 to 2048), but the errors returned right after restart.

The CPU and RAM on this system are doing quite well - system load is below 1.00 most of the time, and RAM is mostly buffers and not swapping.

What is "aeq" and where are it's parameters adjusted? Can we increase the max_size_kb (presumably to 1024)?

Or is this a red herring and we need to look elsewhere?

sjscott · ‎08-21-2015

In your server.conf add the following: (This setting adjusts all queues)

[queue]
maxSize = 500MB # this can be KB,MB, or GB.

You can also specify specific queues, but I was unable to get it to work. See server.conf spec.

dglinder · ‎08-11-2014

Thanks, it looks like the "aeq" queue is a single-threaded process that decompresses .GZ files, and that's all this system is doing. We're investigating sending uncompressed log files to see if that helps. I'll update when we know more.

martin_mueller · ‎08-08-2014

aeq appears to feed the archiveProcessor: http://wiki.splunk.com/Community:HowIndexingWorks

That's just a symptom though, of a bottleneck somewhere down the line. As you can see from the diagrams, aeq is right at the top. Look for the bottommost queue that's blocked and you have your culprit.

Hitting the thruput in limits.conf has a dedicated event, so you should see that if you're indeed hitting that.

somesoni2 · ‎08-08-2014

Some references on similar issues

http://answers.splunk.com/answers/83334/what-are-the-various-queues-in-splunk

http://answers.splunk.com/answers/3250/what-does-the-queue-named-aq-how-did-it-get-blocked

http://answers.splunk.com/answers/1142/the-aggqueue-and-parsingqueue-consistently-full-blocked-how-d...

What is the queue named "aeq" and how to increase its max_size_kb?

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!