We have a multi-site (2 sites) environment with two 6-member SHCs. Each site is in a different physical location. And each site has 3 members of each SHC. I know, I should probably have a majority of in one site for each cluster, but I don't.
Next year, one site is being physically moved to a new location which could be a 3-day outage. I'm trying to determine how best to handle that for my SHCs. If i just move them, I'll lose a majority of members and won't be able to select a captain.
Some ideas I have:
statically set the captain before the outage and make it dynamic again once the boxes are back up
remove one or more of the members that are being removed first, leaving a majority in the site that will remain up. And then add them back after the migration
Add a temporary search head to the cluster to the site that will remain up, giving it a majority
I'm leaning toward 1 or 2. Any thoughts on the best approach? Does it matter?
if i understand correctly, you have 2 clusters with 6 search heads each spread across 2 dc, so cluster1 has 3 in dc1 and 3 in dc2, same goes for cluster2. if this is correct? why wont 1 site be able to elect a new captain? you have 3 search heads for each cluster in the site that remains still so they can select new captain between them.
hi adonio, you understand correctly.
It is my understanding that you need a majority to elect a captain. Yes, we'll have 3 search heads remaining which is enough for a cluster. But the cluster is still considered to be a 6-member cluster even if the other 3 are unavailable. So we would still need 4 of 6 to elect an captain. Or if we remove some members before the migration, 3 would be enough for a majority.
follow these steps to remove a member:
then after your site is back up, follow these steps to add the previously removed members:
if its good, will convert to an answer
thanks. yeah, I know what to do for each option, just looking for which approach is best or if it doesn't really matter (or maybe it's just subjective)
will convert to an answer, i think this is the way to go.
safely remove the members of the site you will move.
let the other members choose a captain.
bring back the members when site is up again
Note that if you're going for alternative 1, even if the captain is static, if won't be able to actually make changes to the cluster, like distributing knowledge objects, as this would need a majority of the search heads to confirm the change. Check out this explanation on RAFT; http://thesecretlivesofdata.com/raft/
If you're going for alternative 2, be sure to remove the search head gracefully. Do not just kill the VM (I've seen people do this before). There is a command to properly take a search head out of a search head cluster.
I think alternative 3 is a neat solution. You could add a low spec SH on that site, but not include it from the load balancer, so that the users will still only use the remaining 3 high spec SHs. After the migration is done, gracefully remove the low spec SH and delete the VM.
1. safely remove members from old site following splunk docs:
2. existing members will elect new captain
3. when new site is up, safely add members following docs:
Splunk best practise says to always use clean instances when adding members to a SHC. If you remove SH instances from the cluster for a while, and then attempt to add them back in, changes could have been done on the the SHC in the meantime, meaning that the removed instances would no longer be in sync with the cluster. As far as I know this is not supported, but I'm just thinking out loud here.