Getting Data In

Why is the splunkd.log reporting lots of "DistributedPeerManager - Unable to distribute to peer named...because peer has status = "Down"."?

Path Finder

I have a very busy search head that complains :

DistributedPeerManager - Unable to distribute to peer named slxxxxxxxxx:9089 at uri https://xxxxxxxx037:9089 because peer has status = "Down" 

The messages will start in splunkd.log at 22:08:10.971 and finish at
22:09:46.994, but the message is reported about 60 times during short time period. A telnet from the SH to the indexer on 9089 shows no connectivity issues.

This has happened off and on for all indexers configured in distributed search. I am wondering if there is a setting that could be adjusted that to prevent these messages from occurring, or if there is a conf value that could be adjust to improve performance under high load. The SH is 10vpcus by 32gig, and there is a high load average on the SH and indexers (lots of searches).

There appears to be no negative impact to the messages, since searches are working. Users are not reporting any issues.

0 Karma

SplunkTrust
SplunkTrust

Hi lisaac, Based on the busyness of the hosts involved in the search, it seems reasonable that there could be momentary periods of high latency that could generate these messages. There are various timeout settings described in http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Distsearchconf that could adjust the environment's expectations, for instance:

# this stanza controls the timing settings for connecting to a remote peer and
# the send timeout
[replicationSettings]
connectionTimeout = 10
sendRcvTimeout = 60

Path Finder

Did this start happening after a recent 6.3 upgrade? What platform are you running?

I've seen this message recently too following my 6.3 and some new app installs.

0 Karma
Don’t Miss Global Splunk
User Groups Week!

Free LIVE events worldwide 2/8-2/12
Connect, learn, and collect rad prizes
and swag!