Getting Data In

Transfer indexed data from standalone Splunk instance to clustered index.

phoenixdigital
Builder

So we have a single standalone Splunk instance Indexer/Search Head with a year or more indexed data on it.

We have been asked to create a brand new clustered solution with

1x master
1x deployment server and license master
2x independent search heads
2x indexers

Now I am aware that when you upgrade an indexer from standalone to clustered the old data remains on the original indexer and is not distributed throughout the cluster.
http://docs.splunk.com/Documentation/Splunk/6.2.3/Indexer/Migratenon-clusteredindexerstoaclustereden...

The kicker here is they don't want to upgrade the older standalone instance they want to use a fresh install.

How can I migrate the indexed data from the old standalone instance to one of the new clustered indexers?

All Splunk versions are 6.x and 5.x so there should be no issues with indexer formats.

Is it just a matter of "thawing" the old data onto one of the clustered indexers?
http://docs.splunk.com/Documentation/Splunk/6.2.3/Indexer/Restorearchiveddata#Clustered_data_thawing

Is this the only way to thaw multiple buckets?
http://answers.splunk.com/answers/120007/thawing-out-multiple-buckets-at-once.html

Finally thawed data never expires/archives does it? They will have to manually remove from thawed when they don't want it anymore.

0 Karma
1 Solution

sk314
Builder

Do you want to have the old data replicated as well? This can be done but not usually recommended (As per Splunk Documentation).

However, if you just want the old data to be searchable in the new setup, you can add the old instance as a non clustered search peer on the search head.

View solution in original post

acharlieh
Influencer

I actually presented at .conf last year about how our 4.3 standalone indexers to 6.x clustered migration went like this:

  1. Install the new 6.x cluster with all existing (at least those that we wanted to keep) indexes configured
  2. Take a downtime of both the old indexers and the 6.x cluster during which you'll:
    • Roll hot buckets to warm on the old cluster (temporarily disabling /blocking forwarding may be necessary here)
    • Copy old indexes to new indexers (If you're going from 1 to multiple indexers, you can spread out the indexes across the new cluster
    • Ensure that the 6.x cluster has all of the existing index/parse time configurations distributed to the slave indexers
    • Configure the old indexers to indexAndForward to the new indexers (ensure that the 6.x receiving ports are open)
  3. Bring back up the 6.x cluster followed by the old server(s)

What you've essentially done here is copied the data so your fresh install is now an upgrade, and made it so that any data that comes into the old cluster is also replicated over to the new cluster. This enables a controlled migration of any forwarders that you might have (especially if other teams control said forwarders) to switch over from one to the other, and all of your users data is in both places, and they can compare how dashboards and reports look in old and new environments. Now note you do double index for any new data coming into the old indexers in this scenario (at least until you retire those indexes on the old indexer, or the entire old indexers), so work with your sales rep to let them know that you're doing this. But in this way all of the data retires out as normal (still non-replicated, just like the upgrade, but all new buckets will be replicated, even in the same indexes).

If you have too much data to copy, then take sk314's suggestion (and Raghav2384's confirmation) of keeping the old indexers around as additional search peers until data retires also works, especially if you keep the same names of indexes across both of these options. However your timeline of when you want to reclaim the old hardware versus your retention time may also sway you one way or another.

You are also correct, if you went the route of putting buckets in thawed then you would have to manually remove them when you no longer want them.

sk314
Builder

Do you want to have the old data replicated as well? This can be done but not usually recommended (As per Splunk Documentation).

However, if you just want the old data to be searchable in the new setup, you can add the old instance as a non clustered search peer on the search head.

phoenixdigital
Builder

This is the route we ended up going for but if the customer had insisted they replicated the old data I would likely have followed acharlieh's advise posted here.

0 Karma

phoenixdigital
Builder

If we could replicate it the old data that would be great but as you said I saw it was not recommended so had no idea of the process to attempt it.

I think they want to retire the old standalone instance so moving the data to one of the new indexers would be the preferred option.

0 Karma

sk314
Builder

I think you should look at the other option. You could make it a non clustered search peer and then retire it once the data on it becomes too old. New data will anyway be getting indexed on the new system.

0 Karma

phoenixdigital
Builder

Thankyou I have put forward that option and hopefully they agree that is the way forward.

So another question if the clustered indexers have data in an index called thisIndex and this standalone non clustered peer has data in an index with the same name thisIndex. The Search head knows to look at both?

0 Karma

sk314
Builder

Search head distributes the search to all peers and combines the results. So you'll see results from both. (if it matches the search query ofcourse!)

0 Karma

Raghav2384
Motivator

I have done the exact same thing as recommended on a large scale deployment. Point the old indexers as search peers to my SH Cluster & Indexer Cluster instance. If the index(s) names are same, it works flawless! Once the required retention's worth of data is on Cluster, you can delete the connection to old indexers or keep using them.

0 Karma
Get Updates on the Splunk Community!

Monitoring Postgres with OpenTelemetry

Behind every business-critical application, you’ll find databases. These behind-the-scenes stores power ...

Mastering Synthetic Browser Testing: Pro Tips to Keep Your Web App Running Smoothly

To start, if you're new to synthetic monitoring, I recommend exploring this synthetic monitoring overview. In ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...