Deployment Architecture

What could be the best approach to migrate an existing single-site indexer cluster to multi-site cluster ?

guilmxm
Influencer

Hi Splunkers,

We are going to migrate our current single-site indexer cluster (running 4 nodes, with replication factor: 2 and search factor: 2, multiple TB or raw data) to new multi-site cluster on 2 data centers.

Currently, the cluster is running fine on these 4 nodes, and a we have a new set of 4 physical servers ready, so we will migrate from single-site to multi-site and also migrate from old servers to new servers.

The current cluster has several TB of raw data from almost every possible type, structured, unstructured, with a low retention (from 1 month to several months) and very high retention (from 1 year to several years)

The final multi-site configuration must imperatively be able to respect an high SLA level, with data center recovery plan compliance.

I am looking for advices, real world feedbacks to build the better scenario possible for this migration to be a success with the lowest level possible of operation, which is why i am asking today.

First, we are aware that it is (unfortunately) not possible to migrate single-site buckets that have non site origin guId to multi-site clusters. (http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Migratetomultisite)
That is something really missing, i hope Splunk will one day implement this...

"The cluster will never create a new copy of the bucket on a non-origin site."

What could be the better approach for the migration scenario ?

We have some ideas, but i would very appreciate any help and interesting advises:

Scenario 1: Perform a standard migration, let the natural retention solves the single-site bucket and manually export / import data for indexes with an high retention

The first scenario would be the following:

  • Integrate the 4 new physical servers to the current single-site cluster
  • Move indexing flow from old nodes to the new multi-site cluster
  • One by one, decomission old indexer nodes and remove them for the cluster
  • Convert the single site-cluster to a multi-site cluster with 2 x nodes on each site
  • Export data for each index with an high retention policy, and re-index these data sets in a new index of the cluster (such that buckets be multi-sites)
  • Remove migrated indexes
  • Wait the required time to have the retenton policy deletes buckets with non origin identification
  • Ensure we have node mode single site buckets
  • Issue data disaster recovery plan tests

Manually exporting data, and re-indexing it is a pain, and will costs time and money...

But this looks like the better approach, any idea ?

Scenario 2:

A other migration scenario would be:

  • Build the new multi-site cluster
  • Set our search head cluster to address both single-site cluster and multi-site cluster (not even sure it is possible ?)
  • Move indexing flow from old nodes to the new multi-site cluster
  • Back up unique buckets from the single-site cluster
  • Restore in 1 node of the Site 1, and 1 node of the Site 2 (search that we have at least one copy of every data on each site, including non origin buckets)
  • Set the search head cluster to search only on the new multi-site cluster
  • Decommission the old single-site cluster

This scenario could be reliable too (if we success in backing up / restoring !), but the bad thing here is that non origin buckets will be never be managed by the cluster if a am not wrong.

If we someday need to migrate on of nodes that were restored with non origin buckets, we will have manually migrates these buckets which is not really compliant with a cluster philosophy...
And finally, in case of unavailability of non restored nodes on each sites, past data would not be available

Scenario 3:

Finally we could export / re-index every piece of data from the old single-site cluster to the new cluster.

This looks the "cleaner" solution as every bucket will be a multi-site bucket.

  • Build the new multi-site cluster
  • Move indexing flow from old nodes to the new multi-site cluster
  • Set the search head cluster to address the new multi-site cluster
  • Export every piece of data
  • Re-index
  • Decommission the old single-site cluster

Et voila 🙂

This would be magic... but i have serious doubts about exporting multiple TB of heterogeneous raw data, and re-indexing it with respect of every application specificity like metadata rewriting and so on...
I do not personally know strong and industrial ways to export / re-index data for high volume and high data complexity...

Thank you in advance for sharing your advice, opinion, feedbacks, and even better solutions !
I may be wrong in some points, don't hesitate to tell 🙂

Thanks

Guilhem

maciep
Champion

I highly doubt that I'll add any value to what you have already, but keep us posted. I'm curious how this goes.

First thought I would have in your situation is whether exporting all of the high-retention data would be enough of a DR plan itself. Meaning, if you lose a data center and you need that data, you do have it. You may not have it in Splunk, but you have it. If that's an acceptable addendum to the DR plan for the remainder of the retention period, then it could save you some headaches in trying to fix address those single-site buckets.

Second idea, although not well thought at all, is if Splunk can be tricked into replicating the data by changing the site on a peer. I mean, if you know all of the single-site buckets are going to be assigned to site 1, can you configured one of your peers that is physically at site 2 to be site 1, and then shutdown one of the site 1 peers to force replication over to site 2. And then reconfigure it to be site 2. Again, no idea if that makes sense let alone whether it's doable. Or whether those would end up being excess buckets and eventually deleted..

Anyway, good luck!

0 Karma

guilmxm
Influencer

Hi ! Thank you for your answer 🙂 And for your ideas !

The second idea could make sense... i even wouldn't have thought about that !
Some headaches that i already have 🙂

For sure, i will update this post in any case

Cheers

0 Karma

jbrooke
New Member

I am curious, how did you end up migrating the data?

0 Karma

guilmxm
Influencer

Hello !

I would be curious too 🙂

Currently, we have migrated all the virtual nodes to new physical dedicated server, still in the same single-site indexer cluster and took the occasion to update Splunk.
In a few weeks we will migrate the single site to multi-site.

As there are no fully satisfying solution, we end up deciding that as larger index (which are also the critical index that receives large data volume and must be eligible to the dc disaster recovery plan) have retention, the situation will be compliant after a few month of multi-site run.

Other data with long retention (from more than 6 month to x years) are not critical and there is no need for these data to take place in the dc recovery plan.
"hot" data will always be available and eligible to dc recovery plan, and the Splunk service will remains fully available for these critical perimeters, which is what the plan requires.

If we identify long term data that is critical, we will have to export and re-index the data, as this can represent a large work, this will be limited at the most.

Will update 🙂

0 Karma

nmohammed
Builder

hi Guilmxm,

how did your data migration go about ? we're planning similar activity wherein we want to migrate existing Splunk with its data to new hosts. Most of our data is critical and again huge snapshot (TB) with long retention policies.

We obviously will have new cluster setup and then have the outputs.conf updated to send data to new cluster , but for old data which still remains critical, how can we make it searchable if we decide to decommission old servers soon ?

any ideas/thoughts from community would be appreciated.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...