Deployment Architecture

Best way to move half of a SHC?

maciep
Champion

We have a multi-site (2 sites) environment with two 6-member SHCs. Each site is in a different physical location. And each site has 3 members of each SHC. I know, I should probably have a majority of in one site for each cluster, but I don't.

Next year, one site is being physically moved to a new location which could be a 3-day outage. I'm trying to determine how best to handle that for my SHCs. If i just move them, I'll lose a majority of members and won't be able to select a captain.

Some ideas I have:

  1. statically set the captain before the outage and make it dynamic again once the boxes are back up

  2. remove one or more of the members that are being removed first, leaving a majority in the site that will remain up. And then add them back after the migration

  3. Add a temporary search head to the cluster to the site that will remain up, giving it a majority

I'm leaning toward 1 or 2. Any thoughts on the best approach? Does it matter?

Thanks,

0 Karma
1 Solution

adonio
Ultra Champion

my suggestion:
1. safely remove members from old site following splunk docs:
https://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Removeaclustermember
2. existing members will elect new captain
3. when new site is up, safely add members following docs:
http://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Addaclustermember#Add_a_member_that_was...

View solution in original post

0 Karma

koshyk
Super Champion

my suggestion:
1. Remove members gracefully. Ensure The cluster knows it has only 3 SHC
2. Since there are 3 members, the cluster would work (minimum is 3 to elect captain)
3. Stop SH's which you have stopped. Take KVstore backup and whole of apps separately (just in case)
4. Wipe everything clean before adding to new Site
5. Add the Search Heads in new site , bootstrap and add it to cluster
6. Deploy SHbundle from deployer
7. Add new SH's to "DMC/Monitoring Console"

0 Karma

maciep
Champion

thanks for the suggestion, koshyk. I think I'd be ok with just running splunk clean before stopping splunk and shutting the boxes off. Not sure I need or want to remove splunk entirely and re-install

0 Karma

adonio
Ultra Champion

my suggestion:
1. safely remove members from old site following splunk docs:
https://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Removeaclustermember
2. existing members will elect new captain
3. when new site is up, safely add members following docs:
http://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Addaclustermember#Add_a_member_that_was...

0 Karma

maciep
Champion

I think this will be the best approach. Even if i add another member to the cluster, I think the 3 that go down for the move will have so much catching up to do, that I may as well just remove them from the cluster to address the situation.

0 Karma

hettervik
Builder

Splunk best practise says to always use clean instances when adding members to a SHC. If you remove SH instances from the cluster for a while, and then attempt to add them back in, changes could have been done on the the SHC in the meantime, meaning that the removed instances would no longer be in sync with the cluster. As far as I know this is not supported, but I'm just thinking out loud here.

0 Karma

adonio
Ultra Champion
0 Karma

hettervik
Builder

Right, thx. Still need to clean the instances though.

0 Karma

hettervik
Builder

Note that if you're going for alternative 1, even if the captain is static, if won't be able to actually make changes to the cluster, like distributing knowledge objects, as this would need a majority of the search heads to confirm the change. Check out this explanation on RAFT; http://thesecretlivesofdata.com/raft/

If you're going for alternative 2, be sure to remove the search head gracefully. Do not just kill the VM (I've seen people do this before). There is a command to properly take a search head out of a search head cluster.

I think alternative 3 is a neat solution. You could add a low spec SH on that site, but not include it from the load balancer, so that the users will still only use the remaining 3 high spec SHs. After the migration is done, gracefully remove the low spec SH and delete the VM.

maciep
Champion

I did not realize that was the case with a static captain. I think that may rule out of that option. Thanks!

0 Karma

adonio
Ultra Champion

if i understand correctly, you have 2 clusters with 6 search heads each spread across 2 dc, so cluster1 has 3 in dc1 and 3 in dc2, same goes for cluster2. if this is correct? why wont 1 site be able to elect a new captain? you have 3 search heads for each cluster in the site that remains still so they can select new captain between them.

maciep
Champion

hi adonio, you understand correctly.

It is my understanding that you need a majority to elect a captain. Yes, we'll have 3 search heads remaining which is enough for a cluster. But the cluster is still considered to be a 6-member cluster even if the other 3 are unavailable. So we would still need 4 of 6 to elect an captain. Or if we remove some members before the migration, 3 would be enough for a majority.

0 Karma

adonio
Ultra Champion

follow these steps to remove a member:
https://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Removeaclustermember
then after your site is back up, follow these steps to add the previously removed members:
http://docs.splunk.com/Documentation/Splunk/6.6.0/DistSearch/Addaclustermember#Add_a_member_that_was...
if its good, will convert to an answer

0 Karma

maciep
Champion

thanks. yeah, I know what to do for each option, just looking for which approach is best or if it doesn't really matter (or maybe it's just subjective)

0 Karma

adonio
Ultra Champion

will convert to an answer, i think this is the way to go.
safely remove the members of the site you will move.
let the other members choose a captain.
bring back the members when site is up again
cheers

0 Karma
Get Updates on the Splunk Community!

Monitoring MariaDB and MySQL

In a previous post, we explored monitoring PostgreSQL and general best practices around which metrics to ...

Financial Services Industry Use Cases, ITSI Best Practices, and More New Articles ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Splunk Federated Analytics for Amazon Security Lake

Thursday, November 21, 2024  |  11AM PT / 2PM ET Register Now Join our session to see the technical ...