Solved: What does 'maxQueueSize' in outputs.conf do?

mctester · ‎08-26-2010

We were running some load over the weekend, and ran into an issue where one of our Forwarder nodes went unresponsive. We are now attributing it to a large mazQueueSize in outputs.conf, all Indexer nodes unreachable, and splunkd consuming all available memory. In the problem case, our maxQueueSize was set to 1000000 and a splunkd process was (in a recorded snapshot) seen consuming 3GB:

maxQueueSize=1000000 8947 root 15 0 3482m 3.1g 7300 S 2.0 39.4 0:35.23 splunkd

On investigation, I restarted splunkd with varying values for mazQueueSize - 10,000; 1,000; and 100 with corresponding reduction in memory consumption:

 maxQueueSize=10000
 11164 root      15   0 2233m 2.1g 7228 S  0.0 26.8   0:10.03 splunkd

 maxQueueSize=1000
 11440 root      15   0  394m 292m 7192 S  0.0  3.7   0:05.34 splunkd

 maxQueueSize=100
 11520 root      15   0  209m 107m 7188 S  0.0  1.4   0:04.50 splunkd

A few questions:

The million entry was produced trying to maximize indexing efficiency, and we are going back to the default. What does varying maxQueueSize do for us?
Is it expected that a large maxQueueSize can cause splunkd to consume all memory? Is there any sort of safety shut-off that should kick in?

Thanks,

Mick · ‎08-26-2010

Yes that behaviour is expected. maxQueueSize controls the number of events that can be stored in memory at any point in time, and increasing it doesn't necessarily mean indexing will work any faster or more efficiently. If the connection between an indexer and a forwarder goes down, the intended behaviour is for the fowarder to fill up it's queues with data ready to send, and then block any more incoming data from file, or from a network device. If the value is set too high, that will result in high resource consumption in the event of a problem/disconnect.

Generally, if your deployment is performing well, there's no reason to increase this beyond the default, as it should never even get as high as 1000. If you were receiving UDP data on your forwarder however, and it was imperative you captured as much as possible when this happens, that would be a reason to increase it to a high number. In the case that data retention was a priority however, I would question the suitability of using UDP in the first place.

View solution in original post

Mick · ‎08-26-2010

Yes that behaviour is expected. maxQueueSize controls the number of events that can be stored in memory at any point in time, and increasing it doesn't necessarily mean indexing will work any faster or more efficiently. If the connection between an indexer and a forwarder goes down, the intended behaviour is for the fowarder to fill up it's queues with data ready to send, and then block any more incoming data from file, or from a network device. If the value is set too high, that will result in high resource consumption in the event of a problem/disconnect.

Generally, if your deployment is performing well, there's no reason to increase this beyond the default, as it should never even get as high as 1000. If you were receiving UDP data on your forwarder however, and it was imperative you captured as much as possible when this happens, that would be a reason to increase it to a high number. In the case that data retention was a priority however, I would question the suitability of using UDP in the first place.

What does 'maxQueueSize' in outputs.conf do?

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

Improve Data Pipelines Using Splunk Data Management

3-2-1 Go! How Fast Can You Debug Microservices with Observability Cloud?