Getting Data In

What is the queue named "aeq" and how to increase its max_size_kb?

dglinder
Path Finder

We have an older RedHat 5.6 box running the Splunk Universal Forwarder 5.0.2 processing a few directories with many *.gz files. The system seems to be keeping up well enough, but we've noticed the metrics.log has started noting a lot of "blocked=true" showing up, mainly from the "aeq" queue. Here's a sample:

[root@linux1621 splunk]# grep "name=aeq" metrics.log | tail
08-08-2014 20:01:03.681 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=63, smallest_size=0
08-08-2014 20:01:34.683 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=0
08-08-2014 20:02:05.562 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=5
08-08-2014 20:02:36.564 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=22
08-08-2014 20:03:07.565 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=482, current_size=15, largest_size=61, smallest_size=0
08-08-2014 20:03:38.564 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=482, current_size=15, largest_size=15, smallest_size=15
08-08-2014 20:04:09.402 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=61, smallest_size=0
08-08-2014 20:04:40.403 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=0
08-08-2014 20:05:11.403 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=1
08-08-2014 20:05:42.404 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=7

These seem to correlate with errors seen in the splunkd.log file:

08-08-2014 20:05:42.836 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:43.044 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:43.708 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:44.057 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:44.394 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:45.363 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:46.339 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:47.939 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:48.251 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:48.459 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...

I tried increasing the maxKBps in limits.conf (doubled it from 1024 to 2048), but the errors returned right after restart.

The CPU and RAM on this system are doing quite well - system load is below 1.00 most of the time, and RAM is mostly buffers and not swapping.

What is "aeq" and where are it's parameters adjusted? Can we increase the max_size_kb (presumably to 1024)?

Or is this a red herring and we need to look elsewhere?

0 Karma

sjscott
Explorer

In your server.conf add the following: (This setting adjusts all queues)

[queue]
maxSize = 500MB # this can be KB,MB, or GB.

You can also specify specific queues, but I was unable to get it to work. See server.conf spec.

0 Karma

dglinder
Path Finder

Thanks, it looks like the "aeq" queue is a single-threaded process that decompresses .GZ files, and that's all this system is doing. We're investigating sending uncompressed log files to see if that helps. I'll update when we know more.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

aeq appears to feed the archiveProcessor: http://wiki.splunk.com/Community:HowIndexingWorks

That's just a symptom though, of a bottleneck somewhere down the line. As you can see from the diagrams, aeq is right at the top. Look for the bottommost queue that's blocked and you have your culprit.

Hitting the thruput in limits.conf has a dedicated event, so you should see that if you're indeed hitting that.

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!