Getting Data In

Whats the best way to migrate /db from one drive to another?

SplunkTrust
SplunkTrust

I'm migrating a standalone indexer from Windows to Linux. I mounted the snapshot onto the Linux box and currently moving data using rsync to the drive where Splunk is installed. I had a bucket collision doing this, I suspect it was because I accidentally left the forwarder on.

So my question.. Is there a better way to approach this? Will turning the forwarder off while rsync'ing prevent bucket collisions?

0 Karma
1 Solution

SplunkTrust
SplunkTrust

I have solved the issue of merging buckets..

If you want to limit your downtime and NOT have to keep the indexers off along with NOT losing any data then this method will work

Background info
We have a standalone Windows indexer and we wanted to migrate to a Linux indexer. We had 10TB of historical data that we had to move to the Linux server with 63 indexes. We took a snapshot of the data and mounted it on the Linux server. I then moved all the configs, dashboards, alerts, saved searches, fields over. Once this was complete it was time for the data migration

Moving the Data

First stop the forwarders and let the data queue in the logs while the forwarders are off.

Next, roll your hot buckets to warm. You can do this by restarting Splunkd or by running a simple command.

You will need to create a new folder in SPLUNK_HOME/splunk/var/lib/splunk/.../ and we will call it copy

You will then need to rsync the data from the windows mount to the Splunk mount into the copy folder.. This prevents you from overwriting your existing data and also allows your index to be available while the rsync is going down. This could take hours if not days to sync

Once complete, you will then need to copy everything from the copy folder into the SPLUNK_HOME/splunk/var/lib/splunk/.../db folder. This will then cause some bucket collisions, but is super easy to fix. It will look something like this

db_earliest_latest_BID

db_02342_2342432_01
db_02342_2342432_02
db_02342_2342432_03
db_02342_2342432_04
db_02342_2342432_05
db_02342_2342432_06
db_02342_2342432_01
db_02342_2342432_02
db_02342_2342432_03
hot_v1_07

Notice the bucket ID is duplicated.. This will automatically disable the index and make the data unsearchable.. You need to simply rename the offending buckets to increase the numbers on the bucket like this

db_02342_2342432_01
db_02342_2342432_02
db_02342_2342432_03
db_02342_2342432_04
db_02342_2342432_05
db_02342_2342432_06
db_02342_2342432_07
db_02342_2342432_08
db_02342_2342432_09
hot_v1_10

Your data will then become available with ZERO gaps

Turn your forwarders back on and pat yourself on the back.

*Note, business rules required limited to zero downtime of the indexer so I could not power it down while syncing the data. This is NOT the preferred method of doing this, you should just copy your /opt drive and sync it over while the indexer is off.. If you have a similar situation which requires limited downtime and merging data then you can take this approach. I'd recommend you practice until your comfortable.. Or contact support

View solution in original post

SplunkTrust
SplunkTrust

I have solved the issue of merging buckets..

If you want to limit your downtime and NOT have to keep the indexers off along with NOT losing any data then this method will work

Background info
We have a standalone Windows indexer and we wanted to migrate to a Linux indexer. We had 10TB of historical data that we had to move to the Linux server with 63 indexes. We took a snapshot of the data and mounted it on the Linux server. I then moved all the configs, dashboards, alerts, saved searches, fields over. Once this was complete it was time for the data migration

Moving the Data

First stop the forwarders and let the data queue in the logs while the forwarders are off.

Next, roll your hot buckets to warm. You can do this by restarting Splunkd or by running a simple command.

You will need to create a new folder in SPLUNK_HOME/splunk/var/lib/splunk/.../ and we will call it copy

You will then need to rsync the data from the windows mount to the Splunk mount into the copy folder.. This prevents you from overwriting your existing data and also allows your index to be available while the rsync is going down. This could take hours if not days to sync

Once complete, you will then need to copy everything from the copy folder into the SPLUNK_HOME/splunk/var/lib/splunk/.../db folder. This will then cause some bucket collisions, but is super easy to fix. It will look something like this

db_earliest_latest_BID

db_02342_2342432_01
db_02342_2342432_02
db_02342_2342432_03
db_02342_2342432_04
db_02342_2342432_05
db_02342_2342432_06
db_02342_2342432_01
db_02342_2342432_02
db_02342_2342432_03
hot_v1_07

Notice the bucket ID is duplicated.. This will automatically disable the index and make the data unsearchable.. You need to simply rename the offending buckets to increase the numbers on the bucket like this

db_02342_2342432_01
db_02342_2342432_02
db_02342_2342432_03
db_02342_2342432_04
db_02342_2342432_05
db_02342_2342432_06
db_02342_2342432_07
db_02342_2342432_08
db_02342_2342432_09
hot_v1_10

Your data will then become available with ZERO gaps

Turn your forwarders back on and pat yourself on the back.

*Note, business rules required limited to zero downtime of the indexer so I could not power it down while syncing the data. This is NOT the preferred method of doing this, you should just copy your /opt drive and sync it over while the indexer is off.. If you have a similar situation which requires limited downtime and merging data then you can take this approach. I'd recommend you practice until your comfortable.. Or contact support

View solution in original post

Legend

Just remember that your solution will not work in a clustered environment, where bucket naming is significantly more complex.

0 Karma

Legend

Nope - remember that Splunk may be moving buckets between hot/warm and cold for a variety of reasons, not just because data is coming from the forwarder.

The only way that Splunk recommends is this:

  1. Stop the Indexer. In this case, you need to stop BOTH indexers. You may also need to clean up the new indexer - or even completely reinstall it before you begin.
  2. Take the snapshot or whatever of the old indexer, and mount/copy/rsync it to the proper location on the new indexer. Both indexers must be down throughout the copy process.
  3. Start the new indexer.
  4. DO NOT turn the old indexer back on.
  5. Any time after step 1, you can edit the forwarders to send their data to the new indexer. So this can happen while you are waiting for the copy to complete in step 2... Note that no data will actually be forwarded until the new indexer is turned on and receiving data in step 3. [BTW, be sure that you have turned on the appropriate receiving port on the new indexer!]

HTH!
Lisa

SplunkTrust
SplunkTrust

@iguinn, unfortunately we are too far in to go this route and I could not get approval to shut both indexers off until the copy was done.. Even though it will retroactively fill when turned back on 😕

I think the route I'm going now is to overwrite our existing indexes with the snapshoted data from a day ago. This will then cause a days gap in my data. I will then go to the log files and modify the first line so it can reindex in Splunk. Whats your thoughts on this?

0 Karma

Legend

If you receive events only from Forwarders, You could only close (or not open) 9997 port during migration.
Beware to the paths, especially in indexers.conf!
Bye.
Giuseppe