So we have a single standalone Splunk instance Indexer/Search Head with a year or more indexed data on it.
We have been asked to create a brand new clustered solution with
1x master
1x deployment server and license master
2x independent search heads
2x indexers
Now I am aware that when you upgrade an indexer from standalone to clustered the old data remains on the original indexer and is not distributed throughout the cluster.
http://docs.splunk.com/Documentation/Splunk/6.2.3/Indexer/Migratenon-clusteredindexerstoaclustereden...
The kicker here is they don't want to upgrade the older standalone instance they want to use a fresh install.
How can I migrate the indexed data from the old standalone instance to one of the new clustered indexers?
All Splunk versions are 6.x and 5.x so there should be no issues with indexer formats.
Is it just a matter of "thawing" the old data onto one of the clustered indexers?
http://docs.splunk.com/Documentation/Splunk/6.2.3/Indexer/Restorearchiveddata#Clustered_data_thawing
Is this the only way to thaw multiple buckets?
http://answers.splunk.com/answers/120007/thawing-out-multiple-buckets-at-once.html
Finally thawed data never expires/archives does it? They will have to manually remove from thawed when they don't want it anymore.
Do you want to have the old data replicated as well? This can be done but not usually recommended (As per Splunk Documentation).
However, if you just want the old data to be searchable in the new setup, you can add the old instance as a non clustered search peer on the search head.
I actually presented at .conf last year about how our 4.3 standalone indexers to 6.x clustered migration went like this:
What you've essentially done here is copied the data so your fresh install is now an upgrade, and made it so that any data that comes into the old cluster is also replicated over to the new cluster. This enables a controlled migration of any forwarders that you might have (especially if other teams control said forwarders) to switch over from one to the other, and all of your users data is in both places, and they can compare how dashboards and reports look in old and new environments. Now note you do double index for any new data coming into the old indexers in this scenario (at least until you retire those indexes on the old indexer, or the entire old indexers), so work with your sales rep to let them know that you're doing this. But in this way all of the data retires out as normal (still non-replicated, just like the upgrade, but all new buckets will be replicated, even in the same indexes).
If you have too much data to copy, then take sk314's suggestion (and Raghav2384's confirmation) of keeping the old indexers around as additional search peers until data retires also works, especially if you keep the same names of indexes across both of these options. However your timeline of when you want to reclaim the old hardware versus your retention time may also sway you one way or another.
You are also correct, if you went the route of putting buckets in thawed then you would have to manually remove them when you no longer want them.
Do you want to have the old data replicated as well? This can be done but not usually recommended (As per Splunk Documentation).
However, if you just want the old data to be searchable in the new setup, you can add the old instance as a non clustered search peer on the search head.
This is the route we ended up going for but if the customer had insisted they replicated the old data I would likely have followed acharlieh's advise posted here.
If we could replicate it the old data that would be great but as you said I saw it was not recommended so had no idea of the process to attempt it.
I think they want to retire the old standalone instance so moving the data to one of the new indexers would be the preferred option.
I think you should look at the other option. You could make it a non clustered search peer and then retire it once the data on it becomes too old. New data will anyway be getting indexed on the new system.
Thankyou I have put forward that option and hopefully they agree that is the way forward.
So another question if the clustered indexers have data in an index called thisIndex and this standalone non clustered peer has data in an index with the same name thisIndex. The Search head knows to look at both?
Search head distributes the search to all peers and combines the results. So you'll see results from both. (if it matches the search query ofcourse!)
I have done the exact same thing as recommended on a large scale deployment. Point the old indexers as search peers to my SH Cluster & Indexer Cluster instance. If the index(s) names are same, it works flawless! Once the required retention's worth of data is on Cluster, you can delete the connection to old indexers or keep using them.