My cluster peers went down after making some changes to indexes.conf on the master and now I am unable to add a peer in our two peer environment. Our Splunk instances are all 5.0.4 running RHEL.
In splunkd.log, I see the following types of errors on the cluster master:
12-16-2013 18:25:19.050 -0800 ERROR CMMaster - event=addPeer guid=XXXXXXXX-XXX-XXX-XXX-XXXXXXXXXXXX status=failed err="Adding a non-standalone bucket as standalone!"
12-16-2013 18:25:19.052 -0800 ERROR ClusterMasterPeerHandler - Cannot add peer=xxx.xxx.xxx.xxx mgmtport=8089 (reason: Adding a non-standalone bucket as standalone!)
We stopped the cluster master and both peers, removed the bucket being complained about, then brought up both peers and the master. After that, the peer which was previously reported as down was listed as up. However, now the peer which was up was listed as down, and the logs showed similar errors about this peer
Has anyone else see this, and what can be done to resolve this error?
We saw this problem after editing indexes.conf, which was being done in $SPLUNK_HOME/etc/master-apps/_cluster/default/indexes.conf
(by mistake). Since new indexes were being added, our theory is things turned sour during this process which caused the peers to believe they weren't members of the cluster and start indexing buckets as stand alone. We saw a number of buckets created without the instance GUID prepended to them, which is the format of standalone buckets.
To resolve the issue so that we could add the peers, we needed to remove all of the buckets that threw messages identifying them as standalone on each peer.
The messages we used looked like this:
12-16-2013 18:25:19.050 -0800 INFO CMMaster - Adding bid=<index_name>~<bucketid>~<instanceGUID> (status='Complete' search_status='Searchable' mask=xxxxxxxxxxxxxxxxxxxxx checksum= standalone=yes size=1007447 genid=0) to peer=<peerGUID>
We found about 14 buckets between both the Peers that were created without the GUID and/or were listed as standalone buckets.
----Steps Taken to restore the peers to cluster----
$SPLUNK_HOME/etc/master-apps/_cluster/local/indexes.conf
.$SPLUNK_HOME/etc/slave-apps/_cluster/local/indexes.conf
.At this point, the cluster was back up and working. Now we needed to reintroduce the standalone buckets back into the cluster. Here is the process we followed:
In /temp/mv db_<newest_time>_<oldest_time>_<bucketid> db_<newest_time>_<oldest_time>_<bucketid>_<guid>
Copied buckets db_<newest_time>_<oldest_time>_<bucketid>_<guid>
to $SPLUNK_HOME/IndexName/db
folder on local instance.
Created the replicated bucket by copying db_<newest_time>_<oldest_time>_<bucketid>_<guid> to rb_<newest_time>_<oldest_time>_<bucketid>_<guid>
Copied buckets rb_<newest_time>_<oldest_time>_<bucketid>_<guid>
to `$SPLUNK_DB/IndexName/colddb' folder on the remote peer in the proper index.
Stopped both peers, then cluster master(again, order matters.)
Started both indexer peers, then cluster master.
Verified that Bucket is Visible to Cluster Master REST endpoint by navigation to SOS App > Indexing> Index Replication> Cluster Master view. Reviewing ‘Bucket information’ showed our copied
buckets as searchable.
I got the similar issue
04-22-2020 23:08:28.867 +0800 ERROR ClusterMasterPeerHandler - Cannot add peer=xxx.xx.xx.xxx mgmtport=8089 (reason: bucket already added as clustered, peer attempted to add again as standalone. guid=2A40ED04-E90B-4771-BD5A-F523865808B6 bid= ~2~2A40ED04-E90B-4771-BD5A-F523865808B6).
thanks to cluster, My data safe to another peer node and it was searchable.
There is the feature "data re-balance"
Setting -> index clustering -> edit -> Data rebalance -> Select the index which you moved (I kept threshold 1, and it worked for me) -> start
My index was re-created and data came back on issue peer automatically, all green from UI, availability was 100% and all data was searchable.
-Rana
Have to say Raj Pal ROCKS!
So does Masa and John Welch in no given order!!!!!
Arion Holliman - AIG Lead SIEM Engineer
Adding additional detail. After finding the offending bucket as suggested by Rajpal we simply did the following.
Since the cluster was essentially recovered simply removing the directory completely did not cause issues.
Eric.
We are using version 6.5.2, I discovered something similar. Our searchheads were showing an error: Failed to add peer 'guid=B193F763-99BC-41B5-89D0-CDEF1F1BF36E server name=splunkindexer14 ip=192.168.12.24:8089' to the master.
Error=bucket already added as clustered, peer attempted to add again as standalone. guid=B193F763-99BC-41B5-89D0-CDEF1F1BF36E bid= myindex~38~B193F763-99BC-41B5-89D0-CDEF1F1BF36E
) To fix this I needed to find the bucket:
Log into the indexer, become splunk, get in the path of the offending bucket and find it.
$> su - splunk
$> cd /x/x1/db/<index>
$> ls -lh
2) The problem bucket will stand out like a sore thumb and be easy to spot because its name will be shorter than all the others.
Actual standalone bucket that caused the problem: db_1487878740_1487878740_38
3) Rename it using move as shown below, adding the CM GUID: (leaving out the double quotes)
$> mv “db_1487878740_1487878740_38” “db_1487878740_1487878740_38_B193F763-99BC-41B5-89D0-CDEF1F1BF36E”
4) Now do list again ( “ls | grep ” ) and verify you were successful, you should see the renamed bucket:
$> ls | grep db_1487878740_1487878740_38
db_1487878740_1487878740_38_B193F763-99BC-41B5-89D0-CDEF1F1BF36E
5) Reboot the indexer
Many thanks Rajpal!
what if there is more than one folder? (name will be shorter than all the others)
This steps worked on version 7.0.2 , i am new to Splunk and implementing in AWS ,
excellent help ' The problem bucket will stand out like a sore thumb '
This worked perfectly on 7.3.0
Thank you Rajpal!
We saw this problem after editing indexes.conf, which was being done in $SPLUNK_HOME/etc/master-apps/_cluster/default/indexes.conf
(by mistake). Since new indexes were being added, our theory is things turned sour during this process which caused the peers to believe they weren't members of the cluster and start indexing buckets as stand alone. We saw a number of buckets created without the instance GUID prepended to them, which is the format of standalone buckets.
To resolve the issue so that we could add the peers, we needed to remove all of the buckets that threw messages identifying them as standalone on each peer.
The messages we used looked like this:
12-16-2013 18:25:19.050 -0800 INFO CMMaster - Adding bid=<index_name>~<bucketid>~<instanceGUID> (status='Complete' search_status='Searchable' mask=xxxxxxxxxxxxxxxxxxxxx checksum= standalone=yes size=1007447 genid=0) to peer=<peerGUID>
We found about 14 buckets between both the Peers that were created without the GUID and/or were listed as standalone buckets.
----Steps Taken to restore the peers to cluster----
$SPLUNK_HOME/etc/master-apps/_cluster/local/indexes.conf
.$SPLUNK_HOME/etc/slave-apps/_cluster/local/indexes.conf
.At this point, the cluster was back up and working. Now we needed to reintroduce the standalone buckets back into the cluster. Here is the process we followed:
In /temp/mv db_<newest_time>_<oldest_time>_<bucketid> db_<newest_time>_<oldest_time>_<bucketid>_<guid>
Copied buckets db_<newest_time>_<oldest_time>_<bucketid>_<guid>
to $SPLUNK_HOME/IndexName/db
folder on local instance.
Created the replicated bucket by copying db_<newest_time>_<oldest_time>_<bucketid>_<guid> to rb_<newest_time>_<oldest_time>_<bucketid>_<guid>
Copied buckets rb_<newest_time>_<oldest_time>_<bucketid>_<guid>
to `$SPLUNK_DB/IndexName/colddb' folder on the remote peer in the proper index.
Stopped both peers, then cluster master(again, order matters.)
Started both indexer peers, then cluster master.
Verified that Bucket is Visible to Cluster Master REST endpoint by navigation to SOS App > Indexing> Index Replication> Cluster Master view. Reviewing ‘Bucket information’ showed our copied
buckets as searchable.
I know this is an old post but i wonder wheter it is not simplier to just move the pirmary (db)buckets and let the master do the copies using the regular replication process instead of copying the rb buckets manually to the target peers? Or am i missing something?
Cheers
Claudio