Deployment Architecture

Getting an Error setting up Clustering

Path Finder

I am attempting to create a cluster but I am receiving an error when I attempt to add a peer (any peer for that matter). My setup looks like this 1 VM serving as the MasterNode and 2 physical indexers. So my replication factor is setup at 2 and the search factor is also at 2. When I attempt to add one of the physical indexers to the cluster this is the message I receive.

failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers master=https://VM-SplunkMN:8089 rv=0 actual_response_code=500 expected_response_code=201 status_line=HTTP/1.1 500 Internal Server Error [ event=addPeer status=retrying replication_address= forwarder_address= search_address= mgmtPort=8089 rawPort=9913 useSSL=false forwarderPort=9910 forwarderPortUseSSL=false serverName=SPLUNK1 activeBundleId=9d924c537e9dea196053cd549f82fbbd status=Up type=Initial-Add baseGen=0 ]

The only thing that stands out to me is the forwarderPort=9910 entry. That is the port I use for server forwarders, not sure why it would show up here.

Tags (2)

Path Finder

i had similar errors . i was able to resolve it by changing the replication port number. the issue was that , i had replication port and the receiving port as the same ( 9997) . after i dedicated port 9887 for replication under server.conf ( [replication_port://9887]) and restarted indexers and cluster master , the issue was resolved .

0 Karma

Contributor

Hi all,

We had the same issue with a faulty bucket .. we see the name at whe Messages in the webgui.. moved the bucket, run splunk fsck..

solves the issue.

Cheers,

Andreas

0 Karma

Communicator

have same issue, on an indexer that had to be taken out of cluster for a while when trying to rejoin.
did touch the instance.cfg which contain the same value as displayed on the cluster master.

0 Karma

Path Finder

I received the same error and I had no connectivity issues.

My chosen method of distribution was by installing one instance then copying the binaries to the other servers

I changed the Server name in etc/system/local/server.conf
but I missed something else - more on this later

I had a hunch that it was something to do with an id of the server so I went ahead and installed splunk on each of the servers one by one.

I created the cluster again and had no problems.

Further investigation into why it didn't work led me to:
/proj/splunk/splunk/etc/instance.cfg
If I had changed the guid in that to something unique on each server then I reckon it would have worked

Engager

Thanks! I had the same issue. I was using Amazon AMIs to launch a indexer cluster comprising of 3 Peer Indexers, 1 Master Indexer and 1 Search Head. your answer resolved my issue.
However, I see that the master indexer node has two search heads, and is registering itself as a search head too in addition to what I gave separately.

0 Karma

Splunk Employee
Splunk Employee

in the search UI you will see a system message like this:
Failed to add peer 'guid=02E2B503-8C98-4690-BD9C-ABAB937BDAE4 server name=indexpeer ip=192.168.1.69:8089' to the master. Error=Cannot register a peer with the master's guid.

You are correct, the two systems have the same guid in instance.cfg and that must be causing the problem

The rest endpoint (/services/cluster/master/peers) should be returning a meaningful error message, and it is not. So if you are debugging this on the slave, all you see is this:
"05-18-2016 16:45:22.102 -0700 WARN CMSlave - Failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers/?output_mode=json master=ghendrey-mbp.local:8092 rv=0 actual_response_code=500 expected_response_code=201 status_line=Internal Server Error error=No error [ event=addPeer status=retrying AddPeerRequest: { _id= active_bundle_id=488D0EABB38D6873F00907580854C72D add_type=Initial-Add base_generation_id=0 latest_bundle_id=488D0EABB38D6873F00907580854C72D mgmt_port=8089 name=02E2B503-8C98-4690-BD9C-ABAB937BDAE4 register_forwarder_address= register_replication_address= register_search_address= replication_port=34572 replication_use_ssl=0 replications= server_name=indexpeer site=default splunk_version=6.4.0 splunkd_build_number=dbd9c8b7bedfe28e2ed0a9140fca47225309167a status=Up } ]."

0 Karma

Splunk Employee
Splunk Employee

I deleted the GUID from instance.cfg on peer. New guid was created on restart. Problem solved for me.

0 Karma

Path Finder

Folks,

I solved my problem.. Here is how:

I had 4 servers in my splunk farm..
1 - Search Head
1 - Master Cluster
2 - Cluster Peers

I also could not get one of my peers to connect, per the same message.
What it came down to was a firewall blocking communication on the Cluster Peers

so using nmap i validated that the following ports were open:
8000 TCP
8089 TCP

the command I used was 'nmap -sS -p 8000-10000 {IP of cluster peer}

Once I figured that out, everything worked like a champ.

Esteemed Legend

You should click Accept on this answer to close your question. Also, it would help to know what the expected/correct output (and maybe the wrong output) of the command was.

0 Karma

Path Finder

warning info from peer node:

11-30-2012 12:05:54.871 +0800 WARN CMMasterHTTPProxy - failed method=POST path=/services/cluster/master/peers master=https://192.168.102.205:8089 rv=0 actual_response_code=500 expected_response_code=201 status_line=HTTP/1.1 500 Internal Server Error

0 Karma

Path Finder

I got the log like that:

11-30-2012 09:58:48.054 +0800 INFO  CMMaster - Adding bid=_audit~1~D4DDF306-0648-4D7E-98B8-F837F439E6C2 (status='Complete' search_status='Searchable' mask=18446744073709551615 checksum= standalone=yes size=1091 genid=0) to peer=D4DDF306-0648-4D7E-98B8-F837F439E6C2

11-30-2012 09:58:48.054 +0800 ERROR CMMaster - event=addPeer guid=D4DDF306-0648-4D7E-98B8-F837F439E6C2 status=failed err="size=332 already committed"
11-30-2012 09:58:48.054 +0800 INFO CMPeer - removing bid=_audit~1~D4DDF306-0648-4D7E-98B8-F837F439E6C2 from peer=D4DDF306-0648-4D7E-98B8-F837F439E6C2
11-30-2012 09:58:48.054 +0800 INFO CMMaster - event=addBucketToFix bid=_audit~1~D4DDF306-0648-4D7E-98B8-F837F439E6C2 msg='Ignoring standalone bucket'
11-30-2012 09:58:48.054 +0800 ERROR ClusterMasterPeerHandler - Cannot add peer=192.168.102.204 mgmtport=8089 (reason: size=332 already committed)
11-30-2012 09:59:48.093 +0800 INFO ClusterMasterPeerHandler - Add peer info replication_address=192.168.102.204 forwarder_address= search_address= mgmtPort=8089 rawPort=8099 useSSL=false forwarderPort=0 forwarderPortUseSSL=true serverName=splunk-index-02.ntt.com.hk activeBundleId=e42fbfc3436bd89262c70e511d343b91 status=Up type=Initial-Add baseGen=0
11-30-2012 09:59:48.099 +0800 INFO CMMaster - event=removeOldPeer guid=D4DDF306-0648-4D7E-98B8-F837F439E6C2 hostport=192.168.102.204:8089 status=success
11-30-2012 09:59:48.099 +0800 INFO CMMaster - event=addPeer guid=D4DDF306-0648-4D7E-98B8-F837F439E6C2 replication_address=192.168.102.204 forwarder_address= search_address= mgmtPort=8089 rawPort=8099 useSSL=false forwarderPort=0 forwarderPortUseSSL=true serverName=splunk-index-02.ntt.com.hk activeBundleId=e42fbfc3436bd89262c70e511d343b91 status=Up type=Initial-Add baseGen=0 bucket_count=13
11-30-2012 09:59:48.099 +0800 INFO CMMaster - Adding bid=_audit~1~D4DDF306-0648-4D7E-98B8-F837F439E6C2 (status='Complete' search_status='Searchable' mask=18446744073709551615 checksum= standalone=yes size=1091 genid=0) to peer=D4DDF306-0648-4D7E-98B8-F837F439E6C2
11-30-2012 09:59:48.099 +0800 ERROR CMMaster - event=addPeer guid=D4DDF306-0648-4D7E-98B8-F837F439E6C2 status=failed err="size=332 already committed"
11-30-2012 09:59:48.099 +0800 INFO CMPeer - removing bid=_audit~1~D4DDF306-0648-4D7E-98B8-F837F439E6C2 from peer=D4DDF306-0648-4D7E-98B8-F837F439E6C2
11-30-2012 09:59:48.099 +0800 INFO CMMaster - event=addBucketToFix bid=_audit~1~D4DDF306-0648-4D7E-98B8-F837F439E6C2 msg='Ignoring standalone bucket'
11-30-2012 09:59:48.099 +0800 ERROR ClusterMasterPeerHandler - Cannot add peer=192.168.102.204 mgmtport=8089 (reason: size=332 already committed)

0 Karma

New Member

Hi All,

I have a similar a problem. All of machines are vmware machines.
In my case the rawPort=9887, and the forwarderport=0
Thanks to help me
Tamas

• failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers master=https://192.168.1.73:8089 rv=0 actual_response_code=500 expected_response_code=201 status_line=HTTP/1.1 500 Internal Server Error [ event=addPeer status=retrying replication_address= forwarder_address= search_address= mgmtPort=8089 rawPort=9887 useSSL=false forwarderPort=0 forwarderPortUseSSL=true serverName=splunk activeBundleId=1f449698180e6acdd12c2a003de7c242 status=Up type=Initial-Add baseGen=0 ]

0 Karma

Influencer

Look in the splunkd.log on the master node, it'll give you more information.

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!