All Apps and Add-ons

How do I re-index an indexed S3 bucket?

jcoates_splunk
Splunk Employee
Splunk Employee

Just helped Support with this and want to document the results...

Let's say that I've indexed an S3 bucket, and realized that my line breakers were wrong and I need to reindex... well, I've got a seek pointer now that prevents me picking up the old data, so what do I do?

1 Solution

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

View solution in original post

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

View solution in original post

skawasaki_splun
Splunk Employee
Splunk Employee

I've done all of the steps above and my generic S3 input is constantly stuck on 2017-11-18 05:56:44,694 level=INFO pid=71734 tid=Thread-4 logger=splunk_ta_aws.modinputs.generic_s3.aws_s3_data_loader pos=aws_s3_data_loader.py:_do_index_data:95 | datainput="irs_990" bucket_name="splunk4good-irs-form-990" | message="The last data ingestion iteration hasn't been completed yet."

dolivasoh
Contributor

Personally, I like using clean eventdata -index {{index}} . Saves a step.

edit: this doesn't work on clustered indexes

.conf21 Now Fully Virtual!
Register for FREE Today!

We've made .conf21 totally virtual and totally FREE! Our completely online experience will run from 10/19 through 10/20 with some additional events, too!