We experience heavy search loads a few times a day. This can cause the Search Peers to drop from the Search Heads, resulting in the dreaded yellow triangles on dashboard panels. "Unable to distribute to peer..."
After the heavy load is gone, Search Heads often don't reconnect to the Peers, or at least not in a timely manner. However, if I drop into the settings and disable/enable, it always reconnects immediately.
Is there a way to increase the frequency of the retries? (Or enable it at all, because seriously, sometimes I've waited hours only to disable/enable and have it work immediately.) I found this in the spec file, but...
checkTimedOutServersFrequency = <integer, in seconds>
* This setting is no longer supported, and will be ignored
I'd first concentrate on WHY you are having heavy search issues on the peers. If you have enterprise support, I'd open a ticket up and attach a diag so support can help you figure out whats going on.
Yes, we have enterprise support. One of our products' apps (big corp, many products, each with its own app) has scheduled all of their summary searches at the top of the hour. Hundreds of them. We're working on resolving this, but it has not happened yet. Still, heavy loads will happen on occasion. I'd like SHs to reconnect ASAP, not hours down the road.
I don't think it will. It's like the SHs are not event trying to reconnect. If I disable/enable the Peer in settings, it connects immediately. Setting longer timeouts on the connection process won't help if it's not trying. Thanks, tho.
connectionTimeout =
* Amount of time in seconds to use as a timeout during search peer connection establishment
this stanza controls the timing settings for connecting to a remote peer and the send timeout
[replicationSettings]
connectionTimeout = 10
sendRcvTimeout = 60
will this not help?