Splunk Enterprise

Why shouldn't dropEventsOnQueueFull config be used while monitoring files?

jangusprangus
Engager

In our outputs.conf for our splunk forwarders we have two tcpout target groups ([tcpout:<target_group>]) . Both tcpout groups have multiple servers/are autolb'd.

Our second tcpout group (remote splunk instance) became unavailable due to a network issue, which caused all of our splunk forwarder's local queues to fill up and block forwarding totally (both groups) as they were no longer able to forward data to the second group.

I'm looking into solutions by using outputs.conf, namely the tcpout settings, maxQueueSize and dropEventsOnQueueFull - a combination of these seems like it will solve our problem, however on reading the documentation here (https://docs.splunk.com/Documentation/Splunk/8.2.5/Admin/Outputsconf), under dropEventsOnQueueFull:

* CAUTION: DO NOT SET THIS TO A POSITIVE INTEGER IF YOU ARE
  MONITORING FILES.

I am monitoring files - so this seems like a deal breaker? Is somebody help me understand why we wouldn't want to configure this setting if we're monitoring files?

Or should we simply set this to 0 (not a positive integer)?

If there's an outage of the second tcpout group, we're fine with losing some data for that site if that's the price of keeping the forwarders continuing to report to our first tcpout group.

Hope that makes sense! Thanks in advance!

 

Labels (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

I suppose if you're ingesting events from files, especially local files, the rate of reading the files is very high compared to the output queue (especially if you're enforcing throughput limits on output).

So in normal circumstances the output queue throughput enforces the overall processing speed because the input gets throttled when the output cannot process more events.

If you set this parameter to a non-negative value, the output is limited either by throughput limits or simply by network performance but the inputs are reading the events as fast as they can (and they can do it really fast). So the events get processed by input and parsing queues, then get pushed to output queue which does not have room for them so they get dropped.

jangusprangus
Engager

Hey @PickleRick ,

Thank you for your answer, it was incredibly helpful.

What I've done is added dropEventsOnQueueFull (300s) to the second tcpout group only, plus increased the maxqueuesize to 50mb. We're going to see how this goes for our environment. The idea behind setting it to 300s is so that we give our second tcpgroup a chance recover if it's only a minor blip. I hope that my understanding is correct there!

Thanks again for your answer.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

In today’s data-heavy environment, organizations are caught in a data distribution dilemma. As data volumes ...

GA: New Data Management App in Splunk Platform

Streamlining Data Management: Introducing a unified experience in Splunk Managing data at scale shouldn’t feel ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...