Getting Data In

HadoopConnect: How do I reset the HDFS input?

jonahtang
Explorer

I have a folder in HDFS that has log files continuously being put into it. I decided to test the HadoopConnect app's import feature and created a test index to store data. Then, I added the folder to the input via the web interface on HadoopConnect. The data imports successfully, and new files are being indexed correctly too.

I decide that I would like to use the app, and delete the test index and input. Then, I add the same input again. However, it seems that there is some persistency in the HDFS file monitoring because only new files are getting indexed. The old ones aren't anymore.

I'd like to know if there is a way to reset this persistent state? I tried deleting $SPLUNK_HOME/var/lib/splunk/persistentstorage/fschangemanager_state because it seemed like a good candidate but to no avail. Please advise, thanks.

0 Karma
1 Solution

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

View solution in original post

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

Get Updates on the Splunk Community!

Tips & Tricks When Using Ingest Actions

Tune in to learn about:Large scale architecture when using Ingest ActionsRegEx performance considerations ...

Announcing Our Splunk MVPs

We are excited to announce the first cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Dashboard Studio Challenge - Learn New Tricks, Showcase Your Skills, and Win Prizes!

Reimagine what you can do with your dashboards. Dashboard Studio is Splunk’s newest dashboard builder to ...