All Apps and Add-ons

How do I re-index an indexed S3 bucket?

jcoates_splunk
Splunk Employee
Splunk Employee

Just helped Support with this and want to document the results...

Let's say that I've indexed an S3 bucket, and realized that my line breakers were wrong and I need to reindex... well, I've got a seek pointer now that prevents me picking up the old data, so what do I do?

1 Solution

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

View solution in original post

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

skawasaki_splun
Splunk Employee
Splunk Employee

I've done all of the steps above and my generic S3 input is constantly stuck on 2017-11-18 05:56:44,694 level=INFO pid=71734 tid=Thread-4 logger=splunk_ta_aws.modinputs.generic_s3.aws_s3_data_loader pos=aws_s3_data_loader.py:_do_index_data:95 | datainput="irs_990" bucket_name="splunk4good-irs-form-990" | message="The last data ingestion iteration hasn't been completed yet."

dolivasoh
Contributor

Personally, I like using clean eventdata -index {{index}} . Saves a step.

edit: this doesn't work on clustered indexes

Get Updates on the Splunk Community!

Build Your First SPL2 App!

Watch the recording now!.Do you want to SPL™, too? SPL2, Splunk's next-generation data search and preparation ...

Exporting Splunk Apps

Join us on Monday, October 21 at 11 am PT | 2 pm ET!With the app export functionality, app developers and ...

[Coming Soon] Splunk Observability Cloud - Enhanced navigation with a modern look and ...

We are excited to introduce our enhanced UI that brings together AppDynamics and Splunk Observability. This is ...