If a given forwarder has quarantined a connection to due too many failed connection attempts, will the forwarder reattempt a connection after a span of time? If so, what is the reattempt interval for a quarantined connection?
By quarantine, I mean the below:
03-16-2012 14:20:24.904 +1000 INFO TcpOutputProc - Connection to 10.1.2.3:9997 closed. Connection closed by server.
03-16-2012 14:20:24.904 +1000 WARN TcpOutputProc - Applying quarantine to idx=10.1.2.3:9997 numberOfFailures=3
(Example logs pulled from another question, but I know this is what they look like)
Hello BryanBerry,
The answer is yes a forwarder will try to reconnect after it has been quarantined. The backoffOnFailure I believe is set to 310sec. There are two other settings in the outputs.conf that help control quarantine which are maxFailuresPerInterval and secsInFailureInterval.
You can override this by using autoLBFrequency in your tcp stanza.
Here one of my previous posts: Splunk-indexer-impact-to-splunk-forwarder-lost-connection-to-splunk-indexer
Also read section called Backoff Settings When Unable To Send Events to Indexer on the outpus.conf
Hope this helps or gets you started. Don't forget to vote up and/or accept answers if they help.
Cheers,
Hello BryanBerry,
The answer is yes a forwarder will try to reconnect after it has been quarantined. The backoffOnFailure I believe is set to 310sec. There are two other settings in the outputs.conf that help control quarantine which are maxFailuresPerInterval and secsInFailureInterval.
You can override this by using autoLBFrequency in your tcp stanza.
Here one of my previous posts: Splunk-indexer-impact-to-splunk-forwarder-lost-connection-to-splunk-indexer
Also read section called Backoff Settings When Unable To Send Events to Indexer on the outpus.conf
Hope this helps or gets you started. Don't forget to vote up and/or accept answers if they help.
Cheers,
Thanks piebob!
@BryanBerry,
Thanks for the correction.
BryanBerry: for future reference, the spec/example files in the documentation are populated directly from the same files that you see in the same version of Splunk, so you don't need to check to see if something is in the files on disk if it's not in the docs--it's the same file. that said, i'll make sure the docteam/dev knows that backoffOnFailure is missing from the spec. thank you!
Checked out http://docs.splunk.com/Documentation/Splunk/5.0.2/Admin/Outputsconf to confirm your findings. I don't see backoffOnFailure defined as a parameter - I only see it referenced by the secsInFailureInterval parameter.
I did just test manually; however, it appears it took 5 minutes and 10 seconds before attempting another connection.
If you change your 30sec to 300 sec / 5 minutes, I'll accept 😄
I also checked the outputs.conf in etc/system/default and say no reference to backoffOnFailure. Odd.
40 seconds if i remember.