All Apps and Add-ons

How do I re-index an indexed S3 bucket?

jcoates_splunk
Splunk Employee
Splunk Employee

Just helped Support with this and want to document the results...

Let's say that I've indexed an S3 bucket, and realized that my line breakers were wrong and I need to reindex... well, I've got a seek pointer now that prevents me picking up the old data, so what do I do?

1 Solution

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

View solution in original post

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

View solution in original post

skawasaki_splun
Splunk Employee
Splunk Employee

I've done all of the steps above and my generic S3 input is constantly stuck on 2017-11-18 05:56:44,694 level=INFO pid=71734 tid=Thread-4 logger=splunk_ta_aws.modinputs.generic_s3.aws_s3_data_loader pos=aws_s3_data_loader.py:_do_index_data:95 | datainput="irs_990" bucket_name="splunk4good-irs-form-990" | message="The last data ingestion iteration hasn't been completed yet."

dolivasoh
Contributor

Personally, I like using clean eventdata -index {{index}} . Saves a step.

edit: this doesn't work on clustered indexes

.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!