Has anyone come across this message in an environment where distributed search is used:
Unable to distribute to peer named xy at uri https://xy:8089 because peer has status = "Authentication Failed"
I sometimes get this error for all the search peers at the same time. I have alerts that send emails, because the authentication fails on all the search peers the search head does not find any results. When I run the search manually in splunkweb later on the search works.
Another thing to check is your performance metrics on the indexers. Something I saw shortly after I first noticed this post and entered that I too was seeing this was other similar messages such as Splunk unable to send the bundle to the search peer. So I looked into some of the performance metrics and found that there was a LOT of network traffic hitting the Indexers. In this situation, I had not yet extended the number of Indexers but had added nearly 100 Forwarders on top of the several hundred already in operation. Each also was sending data from at least 5 busy logs. There was also close to 20-25 users or scheduled searches running at any one time. This problem cleared once my rearchitecture of that Splunk instance was completed and more Indexers were available over which to spread the work.
Please check whether you are using same serverName value (in server.conf) across different splunk instances.
I had exactly same problem when I installed several splunk instances in a same machine in same OS account.
But after changing those to different value from each other, the problem was disappeared.
Hope this helpful.
Thank you for the answer I asked this question a while ago and do not have access to that particular splunk installation any more, so I can't verify but I don't think that this was the issue.
So its a mystery still as to why this occurs. The error occurs for peers that are normally already working but randomly and for unknown reason they fail and throw this error.
Would be nice if Splunk would respond with a solution, workaround or at least explanation.
I think the splunk answers is not a good space for resolving any problem.
please answer me about the below error alarm in distributed search :
"Encountered the following error while trying to update: In handler 'distsearch-peer': Error while sending public key to search peer: Connection reset by peer"
i also add the public key from splunk server : /opt/splunk/etc/auth/distServerKeys/trusted.pem >>>>> to >>>>> splunkforwarder : /opt/splunkforwarder/etc/auth//trusted.pem
did you think i act corrently???? or i must solve with another way??
The only preliminary hints I've had so far are a suggestion that it might be down to I/O overload. The question put to me was whether the disk subsystem had sufficient IOPS to match Splunk's hardware recommendations. I.e. "it could be overloaded indexers."
The below is more of a comment than an answer I think. Might be worthwhile un-ticking this (if you can) so that anyone who might have an answer won't skip it because it's already been covered.
I'v never actively solved this problem but I'm not getting it any more at the moment. There is another post here http://splunk-base.splunk.com/answers/42495/unable-to-distribute-to-peer-adjustable-timeout there is more information but no solution.