We are using Splunk DMC to monitor the health of our Splunk infrastructure. From last few days, DMC is alerting that our indexer server "XXXX" is not responding.
However, when we log in and check the server "XXXX", it was working fine and Splunk was up and running.
Error:
Unable to distribute to peer named "XXXX" using the uri-scheme=https because peer has status="Down". Please verify uri-scheme, connectivity to the search peer, that the search peer is up, and an adequate level of system resources are available. See the Troubleshooting Manual for more information.
Any advice on this?
The search query of alert "[DMC Alert - Search Peer Not Responding]" hitting the rest point on "splunk_server=local" where "local" refers to the search head. Below is the base search,
| rest splunk_server=local /services/search/distributed/peers/
| where status!="Up"
| fields peerName, status
| rename peerName as Instance, status as Status
Looks like the DMC is timing out when search peer is not available to reach and throws the inconsistent results to the search query.
Below is the workaround to fix this:
On the DMC please add the below parameters under $SPLUNK_HOME/etc/system/local/distsearch.conf,
[distributedSearch]
statusTimeout = 120
connectionTimeout = 120
serverTimeout = 120
sendTimeout = 120
receiveTimeout= 120
authTokenConnectionTimeout = 120
authTokenSendTimeout = 120
authTokenReceiveTimeout = 120
Restart Splunk service on DMC for the changes to take effect.
====
On indexers, please add the below parameters under $SPLUNK_HOME/etc/system/local/distsearch.conf,
[replicationSettings]
connectionTimeout = 120
sendRcvTimeout = 120
Restart Splunk service on indexers for the changes to take effect.
Hope it helps!
Cheers.