Getting Data In
Highlighted

Indexer fails to join back cluster due to standalone buckets

Splunk Employee
Splunk Employee

Indexer in the cluster was abruptly shutdown and subsequently fail to join back to the cluster. Please help to provide the steps to clean up the standalone buckets to allow the indexer to join back to the cluster.

warning message in splunkd.log:
xx-xx-xxxx xx:xx:xx.xxx -0500 WARN CMSlave - Failed to register with cluster master reason: failed method=POST path=/services/cluster/master/peers/?outputmode=json master=xxx.xxx.xxx:8089 rv=0 gotConnectionError=0 gotUnexpectedStatusCode=1 actualresponsecode=500 expectedresponsecode=2xx statusline=“Internal Server Error” socketerror=“No error” remoteerror=Cannot add peer=xxx.xxx.xxx.xxx mgmtport=8089 (reason: bucket already added as clustered, peer attempted to add again as standalone. guid=C199873F-6E72-43D8-B54F-554750ACE904 bid= mibatch~314~C199873F-6E72-43D8-B54F-554750ACE904). [ event=addPeer status=retrying AddPeerRequest: { _id= activebundleid=403F2E7869E35F5BB8C945D993035AA2 addtype=Initial-Add basegenerationid=0 batchserialno=7 batchsize=18 forwarderdatarcvport=9997 forwarderdatausessl=0 lastcompletegenerationid=0 latestbundleid=403F2E7869E35F5BB8C945D993035AA2 mgmtport=8089 name=C199873F-6E72-43D8-B54F-554750ACE904 registerforwarderaddress= registerreplicationaddress= registersearchaddress= replicationport=8003 replicationusessl=0 replications= servername=xxx.xxx.xxx site=default splunkversion=7.2.0 splunkdbuild_number=8c86330ac18 status=Up } ].

0 Karma
Highlighted

Re: Indexer fails to join back cluster due to standalone buckets

Splunk Employee
Splunk Employee

When the indexer is disabled as search peer, the hot buckets are rolled over to warm using the standalone bucket naming convention. When the peer is re-enabled subsequently, the cluster master remembers those buckets as clustered and expects the buckets to be named in the clustered bucket convention but it was not the case and it had to reject the peer request to rejoin the cluster. More details in Unable to disable and re-enable a peer.

Here are the the steps to rename the standalone buckets to clustered bucket convention:

  1. Search for the offending standalone buckets in the bucket directory (Default location: $SPLUNK_HOME/var/lib/splunk/*/db/).
  2. Scan through the indexes db-folders to find the standalone buckets. Naming convention of standalone buckets that are problematic: db<*newesttime>_<oldesttime*><bucketid>. i.e. db15508125741550720467_53
  3. Append the cluster master GUID to the standalone buckets: Rename from db<*newesttime>_<oldesttime*><bucketid> to db<*newesttime>_<oldesttime*><bucketid><guid> i.e. db1550812574155072046753_C199873F-6E72-43D8-B54F-554750ACE904 Note: guid=C199873F-6E72-43D8-B54F-554750ACE904
  4. Restart the indexer and it will rejoin back to the cluster.

View solution in original post

Highlighted

Re: Indexer fails to join back cluster due to standalone buckets

Path Finder

Thanks, Keio! Clarification: in step #2, "Scan through the indexes db-folders" means var/lib/splunk/*/db/ , not just var/lib/splunk/defaultdb/db/.

0 Karma
Highlighted

Re: Indexer fails to join back cluster due to standalone buckets

Splunk Employee
Splunk Employee

Thanks for the clarification, have revised the path to the indexes db-folders to $SPLUNK_HOME/var/lib/splunk/*/db/.

0 Karma
Highlighted

Re: Indexer fails to join back cluster due to standalone buckets

Thanks ! It help me to recover 2 failed nodes in my cluster

0 Karma