Getting Data In

HadoopConnect: How do I reset the HDFS input?

jonahtang
Explorer

I have a folder in HDFS that has log files continuously being put into it. I decided to test the HadoopConnect app's import feature and created a test index to store data. Then, I added the folder to the input via the web interface on HadoopConnect. The data imports successfully, and new files are being indexed correctly too.

I decide that I would like to use the app, and delete the test index and input. Then, I add the same input again. However, it seems that there is some persistency in the HDFS file monitoring because only new files are getting indexed. The old ones aren't anymore.

I'd like to know if there is a way to reset this persistent state? I tried deleting $SPLUNK_HOME/var/lib/splunk/persistentstorage/fschangemanager_state because it seemed like a good candidate but to no avail. Please advise, thanks.

0 Karma
1 Solution

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

View solution in original post

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk, and empower your SOC to reach new heights! Duration: 1 hour  Prepare to ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...