Getting Data In

What is the queue named "aeq" and how to increase its max_size_kb?

dglinder
Path Finder

We have an older RedHat 5.6 box running the Splunk Universal Forwarder 5.0.2 processing a few directories with many *.gz files. The system seems to be keeping up well enough, but we've noticed the metrics.log has started noting a lot of "blocked=true" showing up, mainly from the "aeq" queue. Here's a sample:

[root@linux1621 splunk]# grep "name=aeq" metrics.log | tail
08-08-2014 20:01:03.681 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=63, smallest_size=0
08-08-2014 20:01:34.683 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=0
08-08-2014 20:02:05.562 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=5
08-08-2014 20:02:36.564 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=22
08-08-2014 20:03:07.565 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=482, current_size=15, largest_size=61, smallest_size=0
08-08-2014 20:03:38.564 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=482, current_size=15, largest_size=15, smallest_size=15
08-08-2014 20:04:09.402 +0000 INFO  Metrics - group=queue, name=aeq, max_size_kb=500, current_size_kb=0, current_size=0, largest_size=61, smallest_size=0
08-08-2014 20:04:40.403 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=0
08-08-2014 20:05:11.403 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=1
08-08-2014 20:05:42.404 +0000 INFO  Metrics - group=queue, name=aeq, blocked=true, max_size_kb=500, current_size_kb=499, current_size=61, largest_size=61, smallest_size=7

These seem to correlate with errors seen in the splunkd.log file:

08-08-2014 20:05:42.836 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:43.044 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:43.708 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:44.057 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:44.394 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:45.363 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:46.339 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:47.939 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...
08-08-2014 20:05:48.251 +0000 INFO  BatchReader - Continuing...
08-08-2014 20:05:48.459 +0000 INFO  BatchReader - Could not send data to output queue (parsingQueue), retrying...

I tried increasing the maxKBps in limits.conf (doubled it from 1024 to 2048), but the errors returned right after restart.

The CPU and RAM on this system are doing quite well - system load is below 1.00 most of the time, and RAM is mostly buffers and not swapping.

What is "aeq" and where are it's parameters adjusted? Can we increase the max_size_kb (presumably to 1024)?

Or is this a red herring and we need to look elsewhere?

0 Karma

sjscott
Explorer

In your server.conf add the following: (This setting adjusts all queues)

[queue]
maxSize = 500MB # this can be KB,MB, or GB.

You can also specify specific queues, but I was unable to get it to work. See server.conf spec.

0 Karma

dglinder
Path Finder

Thanks, it looks like the "aeq" queue is a single-threaded process that decompresses .GZ files, and that's all this system is doing. We're investigating sending uncompressed log files to see if that helps. I'll update when we know more.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

aeq appears to feed the archiveProcessor: http://wiki.splunk.com/Community:HowIndexingWorks

That's just a symptom though, of a bottleneck somewhere down the line. As you can see from the diagrams, aeq is right at the top. Look for the bottommost queue that's blocked and you have your culprit.

Hitting the thruput in limits.conf has a dedicated event, so you should see that if you're indeed hitting that.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) v3.54.0

The Splunk Threat Research Team (STRT) recently released Enterprise Security Content Update (ESCU) v3.54.0 and ...

Using Machine Learning for Hunting Security Threats

WATCH NOW Seeing the exponential hike in global cyber threat spectrum, organizations are now striving more for ...

New Learning Videos on Topics Most Requested by You! Plus This Month’s New Splunk ...

Splunk Lantern is a customer success center that provides advice from Splunk experts on valuable data ...