Getting Data In

Best approach to move data to a different index?

lyndac
Contributor

Using Splunk 6.3.1, 1 search head, 4 indexers, 1 UF.

I have ALOT of data that got put into the wrong index. We have to segregate our data into different indexes based on the value of a specific field. If the field is not present (or doesn't match one of the transforms), the data is put into an "error" index. Everything was working perfectly, until someone upstream from me changed the values of the field without telling me. Now I have data in the error index that should not be there. What is the easiest way to get the raw data back out and re-index it correctly. (I'm not worried about the cost of re-indexing, and can |delete the data from the error index after it is moved to the correct index).

The original data was in json format. I tried using the dump command index=foo field=value | dump basefilename=20160510-wrongidx format=raw fields=_raw. This command tried to generate a dump file for every source (which there were millioins), so the search kept dying. I narrowed it down to jus a day's worth and got output, but it wasn't in a format i can easily use.

Is there a better approach? What would you guys recommend?

1 Solution

lguinn2
Legend

First, I hope you have fixed the inputs and transforms so that all the data is going where it should. This fixes the problem "from this day forward."

Now you have to deal with the data during the period when things were wrong. Since you cannot move data between indexes, you have two tasks: (1) get the data into the right index and (2) remove the data from the wrong index.

1 - Get the data into the right index. If you have the original input files/logs: you can reindex them. Use btprobe on the forwarders to reset the file pointer so that the files will be reindexed. Be sure to move any data that would be duplicated...
If you don't have the original input files/logs: things are harder. You can search for the data that needs to be reindexed and use export to push it to disk. But then you may need to clean it up to get it back to a format that can be re-indexed properly.

2 - Remove the data from the wrong index. You have two choices: either let the data "age out" of the index naturally or use the delete command. If you use the delete command, just remember that it does not recover the disk space.

Documentation for btprobe
btprobe example

View solution in original post

0 Karma

lguinn2
Legend

First, I hope you have fixed the inputs and transforms so that all the data is going where it should. This fixes the problem "from this day forward."

Now you have to deal with the data during the period when things were wrong. Since you cannot move data between indexes, you have two tasks: (1) get the data into the right index and (2) remove the data from the wrong index.

1 - Get the data into the right index. If you have the original input files/logs: you can reindex them. Use btprobe on the forwarders to reset the file pointer so that the files will be reindexed. Be sure to move any data that would be duplicated...
If you don't have the original input files/logs: things are harder. You can search for the data that needs to be reindexed and use export to push it to disk. But then you may need to clean it up to get it back to a format that can be re-indexed properly.

2 - Remove the data from the wrong index. You have two choices: either let the data "age out" of the index naturally or use the delete command. If you use the delete command, just remember that it does not recover the disk space.

Documentation for btprobe
btprobe example

0 Karma

inventsekar
Ultra Champion
I would go for one of these method to export large data set

http://docs.splunk.com/Documentation/Splunk/6.4.0/Search/Exportsearchresults#Export_data_using_the_C...
http://docs.splunk.com/Documentation/Splunk/6.4.0/Search/Exportsearchresults#Export_using_the_Splunk...

well, i was clicking this link, but it asked me to login to my splunk user account.. i did and still it simply taken me to docs.splunk.com.. 

then, i understood that, splunk version 6.4.0 documentation is no more available at all. 

well, then, the links should automatically redirect to the latest version, right...just a suggestion.. thanks. 

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...