Archive

re-index cleaned data

Explorer

I cleaned splunk index from prior to a certain date. Now it seems that I need that data to be searchable again.

My understanding is that nothing is actually removed from the index, so how to I re-index it for at time period? Like re-index data prior to July, for example.

or can I just: splunk clean eventdata -index _thefishbucket will this do what I'm looking for?

Tags (1)
0 Karma

Splunk Employee
Splunk Employee

the simple method to clean and reindex data, if you still have the original log files will be to :

1 - hide the events already indexed by using a very selective delete command http://www.splunk.com/base/Documentation/4.1.6/Admin/RemovedatafromSplunk

Tips : if it's on a single sourcetype, specify it. If it's on a particular period, use earliest and latests.

2 - test a new batch input see http://www.splunk.com/base/Documentation/4.1.6/Admin/Inputsconf create a new input using the batch command on a specific folder and specify all the meta fields to match your goals (new sourcetype, specific host name, source, test index) when your input is tested and validated, clean the test index, and change to the final index.

Tips : If you have trouble to handle diversity of metafields you can also create several batch inputs with different settings. (one per host, or with a specific redefined source path ...)

3 - reindex your files copy your logs in the folder, they will be indexed and the file will be deleted after import.

Tips : If you anticipate that you can blow up your daily licenses quotas, you can reindex over several days.

Splunk Employee
Splunk Employee

You could set up a monitor, looking at the files you wish to reindex, with a CRCSalt value set, to perturb our hashing function so splunk will consider them new and reindex.

Be careful, if you point at the whole directory with this pattern (eg with no white or blacklist) then you will reindex everything. It might be good to set up a test index or test splunk instance first.

Splunk Employee
Splunk Employee

What you can do is reindex the data. It seems to me that you have the data in your drives somewhere? I would suggest gathering the data that you want reindexed and placing it all in one directory. Then monitor this directory. (this is so that the CRC changes for all files). Note though that the source will change as well. (there is no need to clean the fishbucket)

This might be a lesson not to you but other people as well. Almost never use the | delete command.. there is no "undelete" command and that is the reason why we also make it quite "difficult" to delete the data. If you think that you do not currently need the data, then roll your buckets, and archive the data you do not need. Basically set up smart data retention policy and you will almost never have to delete data.

Hope this helps.
Cheers,
.gz

0 Karma

Splunk Employee
Splunk Employee

How did you clean the index before? Did you do it from the web interface using the | delete command?