Deployment Architecture

Why is my search head cluster not working after updating to Splunk 6.5.0?

Builder

My search head cluster is no longer working after an update from 6.4 to 6.5.0 (I think it was the update!). It seemed to work just fine after the update but then I get in today and it is not working. Here is the log messages:

10-03-2016 17:10:37.586 +0000 WARN  DistributedPeerManagerHeartbeat - Send failure while pushing PK to search peer = http://10.0.8.7:8089 , Connect Timeout
10-03-2016 17:10:37.586 +0000 ERROR DistributedPeerManagerHeartbeat - Status 502 while sending public key to cluster search peer http://10.0.8.8:8089:
10-03-2016 17:10:37.586 +0000 WARN  DistributedPeerManagerHeartbeat - Send failure while pushing PK to search peer = http://10.0.8.71:8089 , Connect Timeout
10-03-2016 17:10:37.586 +0000 ERROR DistributedPeerManagerHeartbeat - Status 502 while sending public key to cluster search peer http://10.0.8.7:8089:

Please advise, it seems as though something happened to SSL in the update.

0 Karma
1 Solution

Builder

This has been resolved. It turns out that a teammate of mine made some network changes and no one was aware. What was actually done, I don't know, but what I do know is that it works now.

View solution in original post

Builder

This has been resolved. It turns out that a teammate of mine made some network changes and no one was aware. What was actually done, I don't know, but what I do know is that it works now.

View solution in original post

Builder

So i think the issue here is with one of the indexing servers... Here is an entry from plunked.log on that server (which is showing as offline in spunk UI under index clustering):

10-03-2016 20:07:52.686 +0000 WARN  CMSlave - Failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers/?output_mode=json master=logmaster.gehccloud.com:8089 rv=0 gotConnectionError=0 gotUnexpectedStatusCode=1 actual_response_code=500 expected_response_code=2xx status_line="Internal Server Error" socket_error="No error" [ event=addPeer status=retrying AddPeerRequest: { _id= active_bundle_id=A4631FE13867828214C38927C4758A0C add_type=Initial-Add base_generation_id=0 forwarderdata_rcv_port=9997 forwarderdata_use_ssl=1 latest_bundle_id=A4631FE13867828214C38927C4758A0C mgmt_port=8089 name=2FA4A693-FFF5-4D18-87A1-AE33D195C81C register_forwarder_address= register_replication_address= register_search_address= replication_port=8080 replication_use_ssl=0 replications= server_name=hdopeusvmlogi1a site=default splunk_version=6.5.0 splunkd_build_number=59c8927def0f status=Up } ].

It looks like this one index peer node cannot add itself to the index cluster. So this seems to be a problem, if not the problem with the search head?!?!

0 Karma

Path Finder

any solution for above issue ? i am also getting the same

Super Champion

Did you upgrade all members of the cluster?

Here is the procedure for upgrading from 6.4 to 6.5 http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/UpgradeaSHC

0 Karma

Builder

Yes I did as per the document.

0 Karma

Builder

the cluster is broken and I am trying to re-add the nodes and it thinks they are already in the cluster, which would make sense. How do I delete "ghost" nodes in the cluster?

0 Karma

SplunkTrust
SplunkTrust

Seems like the cluster master connection is not happening from Search Head. Did anything changes like host/Ip of SH OR cluster master? Could you verify if communication is allowed from SH to cluster master on port 8089?

0 Karma

Builder

I am able to communicate to the master on port 8089... I am wondering is I did the upgrade wrong. I see conflicting info that I needed to break the shcluster before upgrading it... is this the case? If so I did not do that, how do we deal with that situation?

0 Karma

SplunkTrust
SplunkTrust
0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!