How should I handle Splunk Cluster Migration to Ne...

rtongue · ‎12-05-2022

Greetings, everyone.

I apologize if this question has been answered before, but I really have a requirement to get a deeper understanding on how to proceed with this. We currently have 2 Splunk Enterprise indexer clusters, one of them is our prod infrastructure spanned across two geo-separated datacenters, with 8 nodes total, 4 in each geo-site. We also have nonprod, which is a very similar setup, but only one physical site, with 4 nodes making up the cluster.

We have recently been asked to assist in migrating these clusters to brand new physical servers and have questions on the best way to proceed. First, we have local SSD storage arrays on our current physical hosts (hot tier), and our "colddb" is located on a chunk of SAN storage, connected by Fiber-channel. This is where the wrinkle is. We are not getting new SAN storage for "colddb", so we will not be able to stand these new servers up and add them to the cluster as 9th nodes, let it replicate, then remove the one it replaces, getting us back to 8, repeating for all nodes. Instead, we will have to remove the SAN allocation from the old nodes and attach to the new nodes making this type of migration impossible.

My initial assumption is that instead, we will need to decom a node, and replace with a new node, one at a time, as if a node failedAm I correct in this assumption?

Is there a better way to handle this, or am I stuck with the current situation? Thanks for your time.

isoutamo · ‎12-05-2022

Hi

here https://community.splunk.com/t5/Splunk-Enterprise/Migration-of-Splunk-to-different-server-same-platf... is old post which told how we did it.

r. Ismo

rtongue · ‎12-06-2022

Thanks for the link, however I don't feel like it addresses my main concern, and that is the fact that I don't have new storage to utilize in this process and would have to decom a node before installing new. That seems dangerous.

isoutamo · ‎12-06-2022

If you cannot borrow additional disk space for colddb for migration time then you have two options

lost / frozen some data which don’t have space during migration as guided on previous post
replace nodes one by one and detach / attach SAN disks from old to the new

1st one is much safer and easier option. 2nd one can lead situations when you could lose events in worst case.

On option 2 you should set up a new node without SAN replicate SSD storage, splunk software and configuration from old node. If you are using rpm or dep packages, install first then use rsync to replicate it from old node. Be sure that you have splunk.secret, GUID and all other configuration from old. Then shutdown old instance, replicate from old final sync with rsync remove option, detach SAN disk and move those to the new node. Ensure that you have correct volume group and file system definitions with correct permissions. After that you should bring a new node up as an old one. Probably it’s good to increase some timeout options for cluster to avoid unneeded bucket replications within other peers in cluster.

As you see this is little bit complicate procedure, but it’s doable. Fortunately you have test environment where you could training with this and write step by step instructions for production migration.

How should I handle Splunk Cluster Migration to New Hardware?

administration

upgrade

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?