Getting Data In

HadoopConnect: How do I reset the HDFS input?

jonahtang
Explorer

I have a folder in HDFS that has log files continuously being put into it. I decided to test the HadoopConnect app's import feature and created a test index to store data. Then, I added the folder to the input via the web interface on HadoopConnect. The data imports successfully, and new files are being indexed correctly too.

I decide that I would like to use the app, and delete the test index and input. Then, I add the same input again. However, it seems that there is some persistency in the HDFS file monitoring because only new files are getting indexed. The old ones aren't anymore.

I'd like to know if there is a way to reset this persistent state? I tried deleting $SPLUNK_HOME/var/lib/splunk/persistentstorage/fschangemanager_state because it seemed like a good candidate but to no avail. Please advise, thanks.

0 Karma
1 Solution

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

View solution in original post

jonahtang
Explorer

Found it: The state data is in $SPLUNK_HOME/var/lib/splunk/modinputs/hdfs. Deleting the file(s) in this seems to make splunk index everything again.

Get Updates on the Splunk Community!

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

 Prepare to elevate your security operations with the powerful upgrade to Splunk Enterprise Security 8.x! This ...

Get Early Access to AI Playbook Authoring: Apply for the Alpha Private Preview ...

Passionate about security automation? Apply now to our AI Playbook Authoring Alpha private preview ...

Reduce and Transform Your Firewall Data with Splunk Data Management

Managing high-volume firewall data has always been a challenge. Noisy events and verbose traffic logs often ...