Deployment Architecture

Increase retry frequency for Search Heads to connect with Search Peers

twinspop
Influencer

We experience heavy search loads a few times a day. This can cause the Search Peers to drop from the Search Heads, resulting in the dreaded yellow triangles on dashboard panels. "Unable to distribute to peer..."

After the heavy load is gone, Search Heads often don't reconnect to the Peers, or at least not in a timely manner. However, if I drop into the settings and disable/enable, it always reconnects immediately.

Is there a way to increase the frequency of the retries? (Or enable it at all, because seriously, sometimes I've waited hours only to disable/enable and have it work immediately.) I found this in the spec file, but...

checkTimedOutServersFrequency = <integer, in seconds>
* This setting is no longer supported, and will be ignored
0 Karma

bosburn_splunk
Splunk Employee
Splunk Employee

I'd first concentrate on WHY you are having heavy search issues on the peers. If you have enterprise support, I'd open a ticket up and attach a diag so support can help you figure out whats going on.

0 Karma

twinspop
Influencer

Yes, we have enterprise support. One of our products' apps (big corp, many products, each with its own app) has scheduled all of their summary searches at the top of the hour. Hundreds of them. We're working on resolving this, but it has not happened yet. Still, heavy loads will happen on occasion. I'd like SHs to reconnect ASAP, not hours down the road.

0 Karma

twinspop
Influencer

I don't think it will. It's like the SHs are not event trying to reconnect. If I disable/enable the Peer in settings, it connects immediately. Setting longer timeouts on the connection process won't help if it's not trying. Thanks, tho.

0 Karma

linu1988
Champion

connectionTimeout =
* Amount of time in seconds to use as a timeout during search peer connection establishment

this stanza controls the timing settings for connecting to a remote peer and the send timeout
[replicationSettings]
connectionTimeout = 10
sendRcvTimeout = 60

will this not help?

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...