Deployment Architecture

Fixing Splunk SH cluster where one member left (Disk Failure)

ramesh_babu71
Path Finder

Hi,
We had four members in SH cluster (all in VM) and the setup was working properly until yesterday. Today one of the VM showed an error that it cannot power-on one of the SH cluster member as the disk has been corrupted beyond repairs.

As of now we can work with 3 SH cluster members as it can elect captain and support our requirements well enough. However, while pushing updates via deployer we are getting error that it cannot reach this (down) SH and fails deploying Apps

./splunk apply shcluster-bundle --answer-yes -target https://splunkSH:8089 -auth username:Password

Error while deploying apps to first member: Error while fetching apps baseline on target=https://192.x.x.x:8089: Network-layer error: No route to host

Please let me know how can I remove the entry for this obsolete Splunk SH member from cluster list.

0 Karma
1 Solution

mayurr98
Super Champion

Hey follow the steps to remove a member from the cluster
Go to /opt/splunk/bin
1. Remove the member.

To run the splunk remove command from another member, use this version:

./splunk remove shcluster-member -mgmt_uri <URI>:<management_port>

Note the following:

mgmt_uri is the management URI of the member being removed from the cluster.

By removing the instance from the search head cluster, you automatically remove it from the KV store. To confirm that this instance has been removed from the KV store, run splunk show kvstore-status on any remaining cluster member. The instance should not appear in the set of results. If it does appear, there might be problems with the health of your search head cluster.

Let me know if this helps !

View solution in original post

mayurr98
Super Champion

Hey follow the steps to remove a member from the cluster
Go to /opt/splunk/bin
1. Remove the member.

To run the splunk remove command from another member, use this version:

./splunk remove shcluster-member -mgmt_uri <URI>:<management_port>

Note the following:

mgmt_uri is the management URI of the member being removed from the cluster.

By removing the instance from the search head cluster, you automatically remove it from the KV store. To confirm that this instance has been removed from the KV store, run splunk show kvstore-status on any remaining cluster member. The instance should not appear in the set of results. If it does appear, there might be problems with the health of your search head cluster.

Let me know if this helps !

ramesh_babu71
Path Finder

@ mayurr98
I read this document but it specifically asks to keep (splunk service in) the instance we are removing running.

Remove the member
Caution: Do not stop the member before removing it from the cluster.

However, In my case that can't be done as the server is already down and has been deleted (from VM console).

Is it possible to run this command on another cluster member or captain even if the target server is down.

0 Karma

mayurr98
Super Champion

hey ramesh try this out

The solution was to run the "splunk resync kvstore" command, as linked to from the following thread:
thereafter run ./splunk remove shcluster-member -mgmt_uri <URI>:<management_port> on another sh member and put the ip of the one you want to remove

As long as current SHC are stable, , in your situation, potentially you can re-build SHC by following the doc below;
http://docs.splunk.com/Documentation/Splunk/6.5.2/DistSearch/Handleraftissues#Fix_the_entire_cluster

If only KVstore is the one complaining and SHC itelf is not looking for the removed SH node anymore, "kvstore resync" will remove the node from the list. Please follow the doc below;
http://docs.splunk.com/Documentation/Splunk/6.5.2/Admin/ResyncKVstore

0 Karma

ramesh_babu71
Path Finder

Thanks @harsmarvania57 & @mayurr98
The below step worked even with the target Splunk instance being down.

./splunk remove shcluster-member -mgmt_uri <URI>:<management_port>

Now it is not showing the message of Splunk instance being down in console or in Splunk bundle update. I didn't have to perform the resync kvstore. If i do it later on (to fix issues related this) I will update it here.

I believe Splunk should update the documentation saying this works even if we need to remove entry of a SHC member which leaves abruptly . 🙂

0 Karma

mayurr98
Super Champion

Hey thanks a lot! I am glad that my solution helped you !!

0 Karma

horsefez
SplunkTrust
SplunkTrust

Hi,

I'm not even sure how your setup with 4 SH's even worked as I believe you need a uneven (3, 5, 7...) number of SH's at all times.

0 Karma

ramesh_babu71
Path Finder

@ harsmarvania57
I read that document but it says to keep (splunk service in) the instance we are removing running.

Remove the member
Caution: Do not stop the member before removing it from the cluster.

However, In my case that can't be done as the server is already down and has been deleted (from VM console).

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

If it is a test environment then I'll try to run that command, I know that they mentioned that splunk should run on member which you are trying to remove.

0 Karma

ramesh_babu71
Path Finder

Hmm...It was working fine. We have this environment for doing testing of Splunk App. It worked fine till now. We had 3 CentOS and 1 Windows for this setup. We even upgraded from 6.6 to 7.0 and was still working fine till the disk crash for windows server.

Still others instances in the cluster are working fine other than the issue that it constantly shows the message of its missing windows server amigo 😞

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Hi @ramesh_babu71,

Please follow this document https://docs.splunk.com/Documentation/Splunk/7.0.1/DistSearch/Removeaclustermember to remove member from SH Cluster and then try to deploy apps from Deployer.

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...