Getting Data In

HadoopConnect: How do I reset the HDFS input?

jonahtang
Explorer

I have a folder in HDFS that has log files continuously being put into it. I decided to test the HadoopConnect app's import feature and created a test index to store data. Then, I added the folder to the input via the web interface on HadoopConnect. The data imports successfully, and new files are being indexed correctly too.

I decide that I would like to use the app, and delete the test index and input. Then, I add the same input again. However, it seems that there is some persistency in the HDFS file monitoring because only new files are getting indexed. The old ones aren't anymore.

I'd like to know if there is a way to reset this persistent state? I tried deleting $SPLUNK_HOME/var/lib/splunk/persistentstorage/fschangemanager_state because it seemed like a good candidate but to no avail. Please advise, thanks.

0 Karma
1 Solution

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

View solution in original post

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...