Deployment Architecture

What's the best way to perform maintenance on a member of a search head cluster?

twinspop
Influencer

I need to perform some emergency maintenance on 1 member of my 4-member Search Head Cluster tonight. From the docs, it looks like I need to remove the target from the SHC, clean the Splunk install, perform my maintenance (including a reboot), then re-add the target member back to the cluster. This seems insane to me. Is that really the best practice?

Would it be easier to just take down the entire cluster while working on this one machine?

0 Karma
1 Solution

Steve_G_
Splunk Employee
Splunk Employee

It seems to me that you should be able to just stop the instance, perform your maintenance, and then add it back into the cluster using this fairly simple procedure: http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Addaclustermember#Add_a_member_that_lef...

It shouldn't be necessary to perform the resync mentioned in that procedure if the maintenance occurs fairly quickly. In that case, you just need to restart the instance to add it back into the cluster.

Of course, it depends on what you mean by "maintenance". Presumably you won't be touching the Splunk configurations on the instance during the maintenance.

View solution in original post

jkat54
SplunkTrust
SplunkTrust

Since its an existing member and you're only going to be "down" briefly and I assume you're not changing things under /etc/apps... I'd just do the maintenance and restart the bad boy.

Cons to this approach:
Could interfere with active users
Could interfere with summary indexing (hopefully you do summary indexing on a separate SHC or standalone SH)
Could force a re-election of the captai if it's the captain

twinspop
Influencer

So many 'shoulds' and 'coulds' in both answers. The uncertainty and finger crossing involved with SHC makes me very nervous. Some day this will improve, right? RIGHT?! 🙂

0 Karma

jkat54
SplunkTrust
SplunkTrust

Point was cowboy up! and reboot that thang! ...

lol to avoid the coulds, remove it first and therefore Steve's answer is most appropriate.

0 Karma

Lucas_K
Motivator

Also check the cluster master and move it if it happened to be the captain.

check who is the captain via "splunk show shcluster-status"

Relocate it to another host via "splunk transfer shcluster-captain -mgmt_uri https://myhostthatiwanttobethenewcaptain:8089". This should be executed on the current captain instance.

twinspop
Influencer

Good tip!

0 Karma

Steve_G_
Splunk Employee
Splunk Employee

It seems to me that you should be able to just stop the instance, perform your maintenance, and then add it back into the cluster using this fairly simple procedure: http://docs.splunk.com/Documentation/Splunk/6.5.0/DistSearch/Addaclustermember#Add_a_member_that_lef...

It shouldn't be necessary to perform the resync mentioned in that procedure if the maintenance occurs fairly quickly. In that case, you just need to restart the instance to add it back into the cluster.

Of course, it depends on what you mean by "maintenance". Presumably you won't be touching the Splunk configurations on the instance during the maintenance.

twinspop
Influencer

I chose the simply-stop-it method. The server was down for about 50 minutes. Restarted and just for grins ran splunk resync shcluster-replicated-config. It seems to be working okay.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...