All Apps and Add-ons

Hadoop Connect - how to change the polling frequency of HDFS file lists, to a longer duration.?

Path Finder

I'm using HadoopConnect. But, its creating pressure on name node with too many frequent requests for listing of files with -lsr recursive. How do we change the frequency to say every 5 or 10 minutes than every minute.
I'm looking for something similar to "auto" in Hadoop DB connect / settings. where we can configure the poll frequency.

Thanks.

0 Karma
1 Solution

Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

View solution in original post

Splunk Employee
Splunk Employee

I'd recommend that you follow the docs on modular inputs and then follow the usage/definition of "whitelist"/"blacklist". Just like with any other default app resouce changes you'd have to be careful during an upgrade of the app, as the new version would overwrite any changes you might have made.

0 Karma

Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

View solution in original post

Path Finder

Hi,Thanks for the answer. Yes, your assumption is correct. Its coming from indexed HDFS input folder.

Regarding the fix, could you please suggest, what changes need to be made to introduce a sleep time variable PER indexed HDFS input folder?

Thanks again.

0 Karma