I'm storing log data in HDFS that is being indexed by Splunk. Due to space constrains I'd like to delete data over a certain age. I know that I can do this by editing indexes.conf but I wanted to see if there were any gotchas that I needed to be aware of.
I'm specifically interested in knowing:
Will Splunk correctly delete the log data from HDFS if I tell it to delete data over a certain age? i.e. is there anything specific I need to know about deletion of Splunk data from HDFS
If instead of deleting the data from Splunk I used a script to automatically delete the files from HDFS would it cause problems with Splunk? (for example the index is expecting to see data that is now missing). There might be some advantages to me deleting the data from HDFS directly rather than depending on Splunk to do it.
I'm quite new to working with Splunk as a developer so I'd be grateful for any advice people have with the above. Thanks.
... View more