Getting Data In

Can forwarder quickly reconnect after a network outage?

hrawat
Splunk Employee
Splunk Employee

Forwarder applies backoff setting (default 300 sec) based on following default settings

#######
# Backoff Settings When Unable To Send Events to Indexer
# The settings in this section determine forwarding behavior when there are
# repeated failures in sending events to an indexer ("sending failures").
#######

maxFailuresPerInterval = <integer>
* The maximum number of failures allowed per interval before a forwarder
  applies backoff (stops sending events to the indexer for a specified
  number of seconds). The interval is defined in the 'secsInFailureInterval'
  setting.
* Default: 2

secsInFailureInterval = <integer>
* The number of seconds contained in a failure interval.
* If the number of write failures to the indexer exceeds
  'maxFailuresPerInterval' in the specified 'secsInFailureInterval' seconds,
  the forwarder applies backoff.
* The backoff time period range is 1-10 * 'autoLBFrequency'.
* Default: 1

backoffOnFailure = <positive integer>
* The number of seconds a forwarder backs off, or stops sending events,
  before attempting to make another connection with the indexer.
* Default: 30


Can forwarder skip backoff?

Backoff settings will be ignored by forwarder if following is set.

autoLBFrequencyIntervalOnGroupFailure = <integer>
* When the entire target group is not reachable,
  'autoLBFrequencyIntervalOnGroupFailure' is the amount of time, in seconds,
  that a forwarder waits before attempting to connect to a target host in the
  group.
* While 'autoLBFrequencyIntervalOnGroupFailure' is in effect, 'autoLBFrequency'
  is ignored. Once first connection is established to a group, 'autoLBFrequency'
  comes into effect again.
* This setting is applied only when
  'autoLBFrequencyIntervalOnGroupFailure' is less than 'autoLBFrequency'.
* Every 'autoLBFrequencyIntervalOnGroupFailure' seconds, a new indexer is
  selected randomly from the list of indexers provided in the server setting
  of the target group stanza.
* -1 means this setting is not active.
* Default: -1

  

Labels (2)

hrawat
Splunk Employee
Splunk Employee

`backoffOnFailure` is unused config. Potentially it may have been active 15 years ago.

Active backoff calculations is based on 

The backoff time period range is 1-10 * 'autoLBFrequency'

Where initially the backoff is 1*autoLBFrequency. If failed next backoff will be 2*autoLBFrequency and finally capped at 10*autoLBFrequency. So by default max backoff calculation is 300 sec since default autoLBFrequency is 30 sec.

>I normally set the autoLBFrequency quite low when using asynchronous load balancing.
If you already have low autoLBFrequency( let's say 10 sec) max backoff is 100 sec.

If you set a
utoLBFrequencyIntervalOnGroupFailure = 1 and entire group is not reachable then max backoff is 1 sec without changing autoLBFrequency.

>Needs to be lower than the autoLBFrequency, so I'm assuming the autoLBFrequency must be set to a reasonable value to use this setting? 

Not really, both are independent. For faster attempt to discover first connection to group after entire group was down, one of the settings, whichever is lower is used.

gjanders
SplunkTrust
SplunkTrust

Could I clarify some points here please?


@hrawat wrote:

Forwarder applies backoff setting (default 300 sec) based on following default settings

#######
# Backoff Settings When Unable To Send Events to Indexer
# The settings in this section determine forwarding behavior when there are
# repeated failures in sending events to an indexer ("sending failures").
#######

secsInFailureInterval = <integer>
* The number of seconds contained in a failure interval.
* If the number of write failures to the indexer exceeds
  'maxFailuresPerInterval' in the specified 'secsInFailureInterval' seconds,
  the forwarder applies backoff.
* The backoff time period range is 1-10 * 'autoLBFrequency'.
* Default: 1

backoffOnFailure = <positive integer>
* The number of seconds a forwarder backs off, or stops sending events,
  before attempting to make another connection with the indexer.
* Default: 30

Under the section "secsInFailureInterval", there is a mention of "* The backoff time period range is 1-10 * 'autoLBFrequency'."

Under the backoffOnFailure , it advises the default backoff time is 30 seconds.

And you have written "Forwarder applies backoff setting (default 300 sec) ".

How does the time period add up to 300 seconds?

 

I do also note that the setting:

autoLBFrequencyIntervalOnGroupFailure 

Needs to be lower than the autoLBFrequency, so I'm assuming the autoLBFrequency must be set to a reasonable value to use this setting? I normally set the autoLBFrequency quite low when using asynchronous load balancing.

 

Thanks

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Think Like an Architect: Introducing the Splunk Certified Cybersecurity Defense ...

In cybersecurity, defenders respond to threats. Architects design the systems that stop them.    As ...

Best Practices: Splunk auto adjust pipeline queue

When you enable autoAdjustQueue in Splunk, maxSize should be understood as the queue size Splunk starts with ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...