I'm using HadoopConnect. But, its creating pressure on name node with too many frequent requests for listing of files with -lsr recursive. How do we change the frequency to say every 5 or 10 minutes than every minute.
I'm looking for something similar to "auto" in Hadoop DB connect / settings. where we can configure the poll frequency.
Thanks.
I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case
Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file
276 def run():
277
278 config = get_config()
....
322
323 # check every 60 seconds for new entries
324 time.sleep(60)
I'd recommend that you follow the docs on modular inputs and then follow the usage/definition of "whitelist"/"blacklist". Just like with any other default app resouce changes you'd have to be careful during an upgrade of the app, as the new version would overwrite any changes you might have made.
I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case
Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file
276 def run():
277
278 config = get_config()
....
322
323 # check every 60 seconds for new entries
324 time.sleep(60)
Hi,Thanks for the answer. Yes, your assumption is correct. Its coming from indexed HDFS input folder.
Regarding the fix, could you please suggest, what changes need to be made to introduce a sleep time variable PER indexed HDFS input folder?
Thanks again.