Deployment Architecture

Increase retry frequency for Search Heads to connect with Search Peers

twinspop
Influencer

We experience heavy search loads a few times a day. This can cause the Search Peers to drop from the Search Heads, resulting in the dreaded yellow triangles on dashboard panels. "Unable to distribute to peer..."

After the heavy load is gone, Search Heads often don't reconnect to the Peers, or at least not in a timely manner. However, if I drop into the settings and disable/enable, it always reconnects immediately.

Is there a way to increase the frequency of the retries? (Or enable it at all, because seriously, sometimes I've waited hours only to disable/enable and have it work immediately.) I found this in the spec file, but...

checkTimedOutServersFrequency = <integer, in seconds>
* This setting is no longer supported, and will be ignored
0 Karma

bosburn_splunk
Splunk Employee
Splunk Employee

I'd first concentrate on WHY you are having heavy search issues on the peers. If you have enterprise support, I'd open a ticket up and attach a diag so support can help you figure out whats going on.

0 Karma

twinspop
Influencer

Yes, we have enterprise support. One of our products' apps (big corp, many products, each with its own app) has scheduled all of their summary searches at the top of the hour. Hundreds of them. We're working on resolving this, but it has not happened yet. Still, heavy loads will happen on occasion. I'd like SHs to reconnect ASAP, not hours down the road.

0 Karma

twinspop
Influencer

I don't think it will. It's like the SHs are not event trying to reconnect. If I disable/enable the Peer in settings, it connects immediately. Setting longer timeouts on the connection process won't help if it's not trying. Thanks, tho.

0 Karma

linu1988
Champion

connectionTimeout =
* Amount of time in seconds to use as a timeout during search peer connection establishment

this stanza controls the timing settings for connecting to a remote peer and the send timeout
[replicationSettings]
connectionTimeout = 10
sendRcvTimeout = 60

will this not help?

0 Karma
Get Updates on the Splunk Community!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...

Stay Connected: Your Guide to October Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...