All Apps and Add-ons

How do I re-index an indexed S3 bucket?

jcoates_splunk
Splunk Employee
Splunk Employee

Just helped Support with this and want to document the results...

Let's say that I've indexed an S3 bucket, and realized that my line breakers were wrong and I need to reindex... well, I've got a seek pointer now that prevents me picking up the old data, so what do I do?

1 Solution

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

View solution in original post

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

skawasaki_splun
Splunk Employee
Splunk Employee

I've done all of the steps above and my generic S3 input is constantly stuck on 2017-11-18 05:56:44,694 level=INFO pid=71734 tid=Thread-4 logger=splunk_ta_aws.modinputs.generic_s3.aws_s3_data_loader pos=aws_s3_data_loader.py:_do_index_data:95 | datainput="irs_990" bucket_name="splunk4good-irs-form-990" | message="The last data ingestion iteration hasn't been completed yet."

dolivasoh
Contributor

Personally, I like using clean eventdata -index {{index}} . Saves a step.

edit: this doesn't work on clustered indexes

Get Updates on the Splunk Community!

Security Highlights: September 2022 Newsletter

 September 2022 The Splunk App for Fraud Analytics (SFA) is now Splunk SupportedUse your existing Splunk ...

Platform Highlights | September 2022 Newsletter

 September 2022 What’s New in 9.0 and How to UpgradeGet a walk through of what is new Splunk Enterprise 9.0 ...

Observability Highlights | September 2022 Newsletter

 September 2022 Splunk Observability SuiteAccess to "Classic" SignalFx Interface Will be Removed on Sept 30, ...