We are going to migrate our current single-site indexer cluster (running 4 nodes, with replication factor: 2 and search factor: 2, multiple TB or raw data) to new multi-site cluster on 2 data centers.
Currently, the cluster is running fine on these 4 nodes, and a we have a new set of 4 physical servers ready, so we will migrate from single-site to multi-site and also migrate from old servers to new servers.
The current cluster has several TB of raw data from almost every possible type, structured, unstructured, with a low retention (from 1 month to several months) and very high retention (from 1 year to several years)
The final multi-site configuration must imperatively be able to respect an high SLA level, with data center recovery plan compliance.
I am looking for advices, real world feedbacks to build the better scenario possible for this migration to be a success with the lowest level possible of operation, which is why i am asking today.
First, we are aware that it is (unfortunately) not possible to migrate single-site buckets that have non site origin guId to multi-site clusters. (http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Migratetomultisite)
That is something really missing, i hope Splunk will one day implement this...
"The cluster will never create a new copy of the bucket on a non-origin site."
What could be the better approach for the migration scenario ?
We have some ideas, but i would very appreciate any help and interesting advises:
Scenario 1: Perform a standard migration, let the natural retention solves the single-site bucket and manually export / import data for indexes with an high retention
The first scenario would be the following:
Manually exporting data, and re-indexing it is a pain, and will costs time and money...
But this looks like the better approach, any idea ?
A other migration scenario would be:
This scenario could be reliable too (if we success in backing up / restoring !), but the bad thing here is that non origin buckets will be never be managed by the cluster if a am not wrong.
If we someday need to migrate on of nodes that were restored with non origin buckets, we will have manually migrates these buckets which is not really compliant with a cluster philosophy...
And finally, in case of unavailability of non restored nodes on each sites, past data would not be available
Finally we could export / re-index every piece of data from the old single-site cluster to the new cluster.
This looks the "cleaner" solution as every bucket will be a multi-site bucket.
Et voila 🙂
This would be magic... but i have serious doubts about exporting multiple TB of heterogeneous raw data, and re-indexing it with respect of every application specificity like metadata rewriting and so on...
I do not personally know strong and industrial ways to export / re-index data for high volume and high data complexity...
Thank you in advance for sharing your advice, opinion, feedbacks, and even better solutions !
I may be wrong in some points, don't hesitate to tell 🙂
I highly doubt that I'll add any value to what you have already, but keep us posted. I'm curious how this goes.
First thought I would have in your situation is whether exporting all of the high-retention data would be enough of a DR plan itself. Meaning, if you lose a data center and you need that data, you do have it. You may not have it in Splunk, but you have it. If that's an acceptable addendum to the DR plan for the remainder of the retention period, then it could save you some headaches in trying to fix address those single-site buckets.
Second idea, although not well thought at all, is if Splunk can be tricked into replicating the data by changing the site on a peer. I mean, if you know all of the single-site buckets are going to be assigned to site 1, can you configured one of your peers that is physically at site 2 to be site 1, and then shutdown one of the site 1 peers to force replication over to site 2. And then reconfigure it to be site 2. Again, no idea if that makes sense let alone whether it's doable. Or whether those would end up being excess buckets and eventually deleted..
Anyway, good luck!
Hi ! Thank you for your answer 🙂 And for your ideas !
The second idea could make sense... i even wouldn't have thought about that !
Some headaches that i already have 🙂
For sure, i will update this post in any case
I would be curious too 🙂
Currently, we have migrated all the virtual nodes to new physical dedicated server, still in the same single-site indexer cluster and took the occasion to update Splunk.
In a few weeks we will migrate the single site to multi-site.
As there are no fully satisfying solution, we end up deciding that as larger index (which are also the critical index that receives large data volume and must be eligible to the dc disaster recovery plan) have retention, the situation will be compliant after a few month of multi-site run.
Other data with long retention (from more than 6 month to x years) are not critical and there is no need for these data to take place in the dc recovery plan.
"hot" data will always be available and eligible to dc recovery plan, and the Splunk service will remains fully available for these critical perimeters, which is what the plan requires.
If we identify long term data that is critical, we will have to export and re-index the data, as this can represent a large work, this will be limited at the most.
Will update 🙂
how did your data migration go about ? we're planning similar activity wherein we want to migrate existing Splunk with its data to new hosts. Most of our data is critical and again huge snapshot (TB) with long retention policies.
We obviously will have new cluster setup and then have the outputs.conf updated to send data to new cluster , but for old data which still remains critical, how can we make it searchable if we decide to decommission old servers soon ?
any ideas/thoughts from community would be appreciated.