Are there any recommended strategies for moving the index data to a new data center?
We're planning to build up a new Splunk cluster (indexers, searchheads, master, license-master) in a new data center. We want to migrate the existing index data.
The most straightforward solution would probably be:
However, we have sth. like 4-5TB of index data, so that would take a real long amount of (down) time.
Are there better solutions?
I was thinking about extending the existing indexer cluster to the new DC, increase the repl. factor so that it's guaranteed that one of the indexers in the new DC must have the data.
Then when everything is synced, shut down the old indexers, disconnect the new indexers from the old master, connect them to the new master. Done
Has anyone had experience with such a scenario? Or any other proposed solutions?
Assuming you have the bandwidth & connectivity, you could simply add the new peers to the existing index cluster. Once SF/RF have been met, offline the "old" peers one at a time. Then you have the data and configuration in the new location.
Offline Reference: Take a peer down permanently
Then you can replace the master node through this documented process: Replace Master Node . Finally, you would swap the License Master through this documented process: Swap License Master .
Assuming you have the bandwidth & connectivity, you could simply add the new peers to the existing index cluster. Once SF/RF have been met, offline the "old" peers one at a time. Then you have the data and configuration in the new location.
Offline Reference: Take a peer down permanently
Then you can replace the master node through this documented process: Replace Master Node . Finally, you would swap the License Master through this documented process: Swap License Master .
Having tried that successfully with 2 test clusters, I'm now trying it on our actual cluster and the replication seems to be stuck resp. probably hasn't really started, yet.
I see that the number of buckets on the new indexers has risen to ~half of the old indexers, but they don't seem to contain data, yet, as the disks are still pretty much empty.
A couple of WARNs in the log files, but nothing where I could really pinpoint the issue. Most suspectible IMO:
WARN AdminHandler:AuthenticationHandler - Denied session token for user: splunk-system-user
(based on other forum posts this seems to point to shd -> indexer issue and not about the index cluster)
or
CMMaster - event=handleReplicationError bid=main~2682~15487F52-1A61-440B-821A-BDA4AA62E608 tgt=DE69C0E9-DE8B-4863-B37E-734D138BDCCA peer_name=splunk-ind-02.live.eu-central-1.zeal.zone msg='target doesn't have bucket now. ignoring'
Also strange that the new indexers have an empty value in the 'Indexer Cluster' column in the Monitoring Console/Instances view.
Update: seems that the issue might have been that the new indexers had a newer version (8.0.4.1) than the old master and indexers (8.0.1)
After using 8.0.1 for the new indexers, too, it looks better now and the replication is apparently making progress.
I've performed this very task myself, and it's fairly tedious, but does work.
The easiest way is to ensure that you have the exact same number of indexers in the new DC as old, and ensure they have enough storage presented to handle your data. High level steps below:
1. Stop ALL forwarders.
2. On the new index cluster, deploy indexes.conf via the master. This will create the filesystems required.
3. Shut down both old and new index clusters.
4. Use rsync to transfer data from old indexer to new, meaning from old_indexer_01 to new_indexer_01
5. Repeat that on each indexer in your cluster (can be ran simultaneously).
6. On the new indexers, ensure that the filesystem is owned by your Splunk user (chown -RP splunk:splunk <your_directories>
7. Once rsync completes on all nodes, bring up the NEW index cluster.
8. Use tstats, etc to verify event counts, etc and that all data is searchable.
9. Use DS to re-configure forwarders to point to new master, then start them back up.
10. Verify new events are coming in to new index cluster.
But that's an offline migration, right?
That's what I'm trying to avoid.
Yes, it is an offline method. But is the fastest and most reliable method that I've found.
I did neglect to leave out one very important step however. Before taking down the indexers be sure to roll all hot buckets to warm (after stopping the forwarders).
No, you have the new index cluster peers in the data center which are part of the index cluster. To remove the "old peers" (previously existing index cluster peers), you would have to "offline" them.
From a doc I linked to on my previous post: https://docs.splunk.com/Documentation/Splunk/8.0.4/Indexer/Takeapeeroffline#Take_a_peer_down_permane...
The enforce-counts version of the offline command is intended for use when you want to take a peer offline permanently, but only after the cluster has returned to its complete state.
The index cluster itself would still be up and available. You would simply use
splunk offline --enforce-counts
On each of the existing peers that you want to remove.
Oh gotcha, I'm still getting used to our new community site as well. Thanks!