We have a multi-site (2 sites) environment with two 6-member SHCs. Each site is in a different physical location. And each site has 3 members of each SHC. I know, I should probably have a majority of in one site for each cluster, but I don't.
Next year, one site is being physically moved to a new location which could be a 3-day outage. I'm trying to determine how best to handle that for my SHCs. If i just move them, I'll lose a majority of members and won't be able to select a captain.
Some ideas I have:
statically set the captain before the outage and make it dynamic again once the boxes are back up
remove one or more of the members that are being removed first, leaving a majority in the site that will remain up. And then add them back after the migration
Add a temporary search head to the cluster to the site that will remain up, giving it a majority
I'm leaning toward 1 or 2. Any thoughts on the best approach? Does it matter?
Thanks,
my suggestion:
1. safely remove members from old site following splunk docs:
https://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Removeaclustermember
2. existing members will elect new captain
3. when new site is up, safely add members following docs:
http://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Addaclustermember#Add_a_member_that_was...
my suggestion:
1. Remove members gracefully. Ensure The cluster knows it has only 3 SHC
2. Since there are 3 members, the cluster would work (minimum is 3 to elect captain)
3. Stop SH's which you have stopped. Take KVstore backup and whole of apps separately (just in case)
4. Wipe everything clean before adding to new Site
5. Add the Search Heads in new site , bootstrap and add it to cluster
6. Deploy SHbundle from deployer
7. Add new SH's to "DMC/Monitoring Console"
thanks for the suggestion, koshyk. I think I'd be ok with just running splunk clean before stopping splunk and shutting the boxes off. Not sure I need or want to remove splunk entirely and re-install
my suggestion:
1. safely remove members from old site following splunk docs:
https://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Removeaclustermember
2. existing members will elect new captain
3. when new site is up, safely add members following docs:
http://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Addaclustermember#Add_a_member_that_was...
I think this will be the best approach. Even if i add another member to the cluster, I think the 3 that go down for the move will have so much catching up to do, that I may as well just remove them from the cluster to address the situation.
Splunk best practise says to always use clean instances when adding members to a SHC. If you remove SH instances from the cluster for a while, and then attempt to add them back in, changes could have been done on the the SHC in the meantime, meaning that the removed instances would no longer be in sync with the cluster. As far as I know this is not supported, but I'm just thinking out loud here.
kindly read the docs:
http://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Addaclustermember
Right, thx. Still need to clean the instances though.
Note that if you're going for alternative 1, even if the captain is static, if won't be able to actually make changes to the cluster, like distributing knowledge objects, as this would need a majority of the search heads to confirm the change. Check out this explanation on RAFT; http://thesecretlivesofdata.com/raft/
If you're going for alternative 2, be sure to remove the search head gracefully. Do not just kill the VM (I've seen people do this before). There is a command to properly take a search head out of a search head cluster.
I think alternative 3 is a neat solution. You could add a low spec SH on that site, but not include it from the load balancer, so that the users will still only use the remaining 3 high spec SHs. After the migration is done, gracefully remove the low spec SH and delete the VM.
I did not realize that was the case with a static captain. I think that may rule out of that option. Thanks!
if i understand correctly, you have 2 clusters with 6 search heads each spread across 2 dc, so cluster1 has 3 in dc1 and 3 in dc2, same goes for cluster2. if this is correct? why wont 1 site be able to elect a new captain? you have 3 search heads for each cluster in the site that remains still so they can select new captain between them.
hi adonio, you understand correctly.
It is my understanding that you need a majority to elect a captain. Yes, we'll have 3 search heads remaining which is enough for a cluster. But the cluster is still considered to be a 6-member cluster even if the other 3 are unavailable. So we would still need 4 of 6 to elect an captain. Or if we remove some members before the migration, 3 would be enough for a majority.
follow these steps to remove a member:
https://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Removeaclustermember
then after your site is back up, follow these steps to add the previously removed members:
http://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Addaclustermember#Add_a_member_that_was...
if its good, will convert to an answer
thanks. yeah, I know what to do for each option, just looking for which approach is best or if it doesn't really matter (or maybe it's just subjective)
will convert to an answer, i think this is the way to go.
safely remove the members of the site you will move.
let the other members choose a captain.
bring back the members when site is up again
cheers