Getting Data In

Best approach to move data to a different index?

lyndac
Contributor

Using Splunk 6.3.1, 1 search head, 4 indexers, 1 UF.

I have ALOT of data that got put into the wrong index. We have to segregate our data into different indexes based on the value of a specific field. If the field is not present (or doesn't match one of the transforms), the data is put into an "error" index. Everything was working perfectly, until someone upstream from me changed the values of the field without telling me. Now I have data in the error index that should not be there. What is the easiest way to get the raw data back out and re-index it correctly. (I'm not worried about the cost of re-indexing, and can |delete the data from the error index after it is moved to the correct index).

The original data was in json format. I tried using the dump command index=foo field=value | dump basefilename=20160510-wrongidx format=raw fields=_raw. This command tried to generate a dump file for every source (which there were millioins), so the search kept dying. I narrowed it down to jus a day's worth and got output, but it wasn't in a format i can easily use.

Is there a better approach? What would you guys recommend?

1 Solution

lguinn2
Legend

First, I hope you have fixed the inputs and transforms so that all the data is going where it should. This fixes the problem "from this day forward."

Now you have to deal with the data during the period when things were wrong. Since you cannot move data between indexes, you have two tasks: (1) get the data into the right index and (2) remove the data from the wrong index.

1 - Get the data into the right index. If you have the original input files/logs: you can reindex them. Use btprobe on the forwarders to reset the file pointer so that the files will be reindexed. Be sure to move any data that would be duplicated...
If you don't have the original input files/logs: things are harder. You can search for the data that needs to be reindexed and use export to push it to disk. But then you may need to clean it up to get it back to a format that can be re-indexed properly.

2 - Remove the data from the wrong index. You have two choices: either let the data "age out" of the index naturally or use the delete command. If you use the delete command, just remember that it does not recover the disk space.

Documentation for btprobe
btprobe example

View solution in original post

0 Karma

lguinn2
Legend

First, I hope you have fixed the inputs and transforms so that all the data is going where it should. This fixes the problem "from this day forward."

Now you have to deal with the data during the period when things were wrong. Since you cannot move data between indexes, you have two tasks: (1) get the data into the right index and (2) remove the data from the wrong index.

1 - Get the data into the right index. If you have the original input files/logs: you can reindex them. Use btprobe on the forwarders to reset the file pointer so that the files will be reindexed. Be sure to move any data that would be duplicated...
If you don't have the original input files/logs: things are harder. You can search for the data that needs to be reindexed and use export to push it to disk. But then you may need to clean it up to get it back to a format that can be re-indexed properly.

2 - Remove the data from the wrong index. You have two choices: either let the data "age out" of the index naturally or use the delete command. If you use the delete command, just remember that it does not recover the disk space.

Documentation for btprobe
btprobe example

View solution in original post

0 Karma
.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!