Splunk Enterprise

Why shouldn't dropEventsOnQueueFull config be used while monitoring files?

jangusprangus
Engager

In our outputs.conf for our splunk forwarders we have two tcpout target groups ([tcpout:<target_group>]) . Both tcpout groups have multiple servers/are autolb'd.

Our second tcpout group (remote splunk instance) became unavailable due to a network issue, which caused all of our splunk forwarder's local queues to fill up and block forwarding totally (both groups) as they were no longer able to forward data to the second group.

I'm looking into solutions by using outputs.conf, namely the tcpout settings, maxQueueSize and dropEventsOnQueueFull - a combination of these seems like it will solve our problem, however on reading the documentation here (https://docs.splunk.com/Documentation/Splunk/8.2.5/Admin/Outputsconf), under dropEventsOnQueueFull:

* CAUTION: DO NOT SET THIS TO A POSITIVE INTEGER IF YOU ARE
  MONITORING FILES.

I am monitoring files - so this seems like a deal breaker? Is somebody help me understand why we wouldn't want to configure this setting if we're monitoring files?

Or should we simply set this to 0 (not a positive integer)?

If there's an outage of the second tcpout group, we're fine with losing some data for that site if that's the price of keeping the forwarders continuing to report to our first tcpout group.

Hope that makes sense! Thanks in advance!

 

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

I suppose if you're ingesting events from files, especially local files, the rate of reading the files is very high compared to the output queue (especially if you're enforcing throughput limits on output).

So in normal circumstances the output queue throughput enforces the overall processing speed because the input gets throttled when the output cannot process more events.

If you set this parameter to a non-negative value, the output is limited either by throughput limits or simply by network performance but the inputs are reading the events as fast as they can (and they can do it really fast). So the events get processed by input and parsing queues, then get pushed to output queue which does not have room for them so they get dropped.

jangusprangus
Engager

Hey @PickleRick ,

Thank you for your answer, it was incredibly helpful.

What I've done is added dropEventsOnQueueFull (300s) to the second tcpout group only, plus increased the maxqueuesize to 50mb. We're going to see how this goes for our environment. The idea behind setting it to 300s is so that we give our second tcpgroup a chance recover if it's only a minor blip. I hope that my understanding is correct there!

Thanks again for your answer.

0 Karma
Get Updates on the Splunk Community!

.conf25 Registration is OPEN!

Ready. Set. Splunk! Your favorite Splunk user event is back and better than ever. Get ready for more technical ...

Detecting Cross-Channel Fraud with Splunk

This article is the final installment in our three-part series exploring fraud detection techniques using ...

Splunk at Cisco Live 2025: Learning, Innovation, and a Little Bit of Mr. Brightside

Pack your bags (and maybe your dancing shoes)—Cisco Live is heading to San Diego, June 8–12, 2025, and Splunk ...