topic Re: Re-index a file and prevent duplicates in Getting Data In

Re-index a file and prevent duplicates

JeremyHagan — Wed, 13 Nov 2013 01:09:51 GMT

Hi,

I have some CSV files which were indexed, but a proportion of the events were corrupted in the index. Each file has up to 1 million records. Is there a way to ask Splunk to re-index this file and to only index events that it doesn't current have? Each event has a unique record ID field.

Re: Re-index a file and prevent duplicates

jtrucks — Wed, 13 Nov 2013 01:31:21 GMT

You might want to make a report of the record IDs you have in Splunk, then cull those from your input file. Then use splunk add oneshot to import the file (or some other method).

Re: Re-index a file and prevent duplicates

JeremyHagan — Wed, 13 Nov 2013 01:36:27 GMT

I was kind of hoping for something a little less manual....

Re: Re-index a file and prevent duplicates

ShaneNewman — Wed, 13 Nov 2013 01:53:30 GMT

An easier way might be to delete the events you have in your index now, clean the fishbucket, and just let Splunk reindex them.

Re: Re-index a file and prevent duplicates

JeremyHagan — Wed, 13 Nov 2013 01:59:19 GMT

Clean the fishbucket?

Re: Re-index a file and prevent duplicates

ShaneNewman — Wed, 13 Nov 2013 02:14:01 GMT

Yes. This is what I do.

Run the search that has the events you need to delete, I assume you don't want to delete the entire index. If you do, run the below command with the index name that you wish to wipe out, then clean _thefishbucket. Otherwise, run your search to find your events, then pipe that "| delete".

cd out to the Splunk\bin directory. Type splunk stop. Then type splunk clean eventdata -index _thefishbucket

Then type splunk start. The rest is automatic, assuming you have fixed the files.