Hi,
we have a search head cluster where a couple of the search heads where removed by shutting down the VMs. In other words, the search heads wasn't removed gracefully as they should be. Now the remaining search heads is complaining because mongodb can't reach the removed search heads. I'm getting the following error messages:
2017-03-23T12:27:42.296Z I NETWORK [ReplExecNetThread-1919] getaddrinfo("prod-searchhead-x") failed: Name or service not known
2017-03-23T12:27:42.290Z I REPL [ReplicationExecutor] Error in heartbeat request to prod-searchhead-x:8191; Location18915 Failed attempt to connect to prod-searchhead-x:8191; couldn't initialize connection to host prod-searchhead-x, address is invalid
Anyone knows how to forefully remove a host from mongodb in the search head cluster, so that we'll get rid of these error messages?
As long as current SHC are stable, , in your situation, potentially you can re-build SHC by following the doc below;
http://docs.splunk.com/Documentation/Splunk/6.5.2/DistSearch/Handleraftissues#Fix_the_entire_cluster
If only KVstore is the one complaining and SHC itelf is not looking for the removed SH node anymore, "kvstore resync" will remove the node from the list. Please follow the doc below;
http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/ResyncKVstore
As long as current SHC are stable, , in your situation, potentially you can re-build SHC by following the doc below;
http://docs.splunk.com/Documentation/Splunk/6.5.2/DistSearch/Handleraftissues#Fix_the_entire_cluster
If only KVstore is the one complaining and SHC itelf is not looking for the removed SH node anymore, "kvstore resync" will remove the node from the list. Please follow the doc below;
http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/ResyncKVstore
The "splunk resync kvstore" command was just what I was loooking for, thanks! Worked like a charm.
Is there some reason, since it is a VM, that you cannot simply restore the machine and follow a proper procedure:
https://docs.splunk.com/Documentation/Splunk/6.5.2/DistSearch/Removeaclustermember
This documentation has the following note:
Important: You must use the procedure documented here to remove a member from the cluster. Do not just stop the member.
Thanks for your comment. We are aware of the procedure in the Splunk documentation, but in this case, what's done is done. We could perhaps set up a new VM, install Splunk on it, and "trick" the search head cluster into believing that this new instance was the previously deleted instance. Then we could properly remove it, though this method takes a lot of extra work. A simpler way would be preferable.