Deployment Architecture

Increase retry frequency for Search Heads to connect with Search Peers

twinspop
Influencer

We experience heavy search loads a few times a day. This can cause the Search Peers to drop from the Search Heads, resulting in the dreaded yellow triangles on dashboard panels. "Unable to distribute to peer..."

After the heavy load is gone, Search Heads often don't reconnect to the Peers, or at least not in a timely manner. However, if I drop into the settings and disable/enable, it always reconnects immediately.

Is there a way to increase the frequency of the retries? (Or enable it at all, because seriously, sometimes I've waited hours only to disable/enable and have it work immediately.) I found this in the spec file, but...

checkTimedOutServersFrequency = <integer, in seconds>
* This setting is no longer supported, and will be ignored
0 Karma

bosburn_splunk
Splunk Employee
Splunk Employee

I'd first concentrate on WHY you are having heavy search issues on the peers. If you have enterprise support, I'd open a ticket up and attach a diag so support can help you figure out whats going on.

0 Karma

twinspop
Influencer

Yes, we have enterprise support. One of our products' apps (big corp, many products, each with its own app) has scheduled all of their summary searches at the top of the hour. Hundreds of them. We're working on resolving this, but it has not happened yet. Still, heavy loads will happen on occasion. I'd like SHs to reconnect ASAP, not hours down the road.

0 Karma

twinspop
Influencer

I don't think it will. It's like the SHs are not event trying to reconnect. If I disable/enable the Peer in settings, it connects immediately. Setting longer timeouts on the connection process won't help if it's not trying. Thanks, tho.

0 Karma

linu1988
Champion

connectionTimeout =
* Amount of time in seconds to use as a timeout during search peer connection establishment

this stanza controls the timing settings for connecting to a remote peer and the send timeout
[replicationSettings]
connectionTimeout = 10
sendRcvTimeout = 60

will this not help?

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...