All Apps and Add-ons

How do I re-index an indexed S3 bucket?

jcoates_splunk
Splunk Employee
Splunk Employee

Just helped Support with this and want to document the results...

Let's say that I've indexed an S3 bucket, and realized that my line breakers were wrong and I need to reindex... well, I've got a seek pointer now that prevents me picking up the old data, so what do I do?

1 Solution

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

View solution in original post

jcoates_splunk
Splunk Employee
Splunk Employee

If you didn't delete the data from S3, you should be fine.

  1. Delete the misindexed data. To soft-delete it, you can use the delete command, or to truly nuke it you can delete the index and make a new one.
  2. Fix your knowledge layer problem in props.conf -- by setting a sourcetype and turning off SHOULD_LINEMERGE, for instance.
  3. Delete the old modular input. It has cached a value for initial_scan_datetime that won't work for us, so we're going to configure a new input instead. http://docs.splunk.com/Documentation/AddOns/latest/AWS/ConfigureInputs#S3_inputs for background.
  4. Add a new modular input and set initial_scan_datetime to a long time ago. The add-on will now go get all of your data and Splunk will line break it properly.

skawasaki_splun
Splunk Employee
Splunk Employee

I've done all of the steps above and my generic S3 input is constantly stuck on 2017-11-18 05:56:44,694 level=INFO pid=71734 tid=Thread-4 logger=splunk_ta_aws.modinputs.generic_s3.aws_s3_data_loader pos=aws_s3_data_loader.py:_do_index_data:95 | datainput="irs_990" bucket_name="splunk4good-irs-form-990" | message="The last data ingestion iteration hasn't been completed yet."

dolivasoh
Contributor

Personally, I like using clean eventdata -index {{index}} . Saves a step.

edit: this doesn't work on clustered indexes

Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...