All Apps and Add-ons

Hadoop Connect - how to change the polling frequency of HDFS file lists, to a longer duration.?

splunkears
Path Finder

I'm using HadoopConnect. But, its creating pressure on name node with too many frequent requests for listing of files with -lsr recursive. How do we change the frequency to say every 5 or 10 minutes than every minute.
I'm looking for something similar to "auto" in Hadoop DB connect / settings. where we can configure the poll frequency.

Thanks.

0 Karma
1 Solution

Ledion_Bitincka
Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

View solution in original post

Ledion_Bitincka
Splunk Employee
Splunk Employee

I'd recommend that you follow the docs on modular inputs and then follow the usage/definition of "whitelist"/"blacklist". Just like with any other default app resouce changes you'd have to be careful during an upgrade of the app, as the new version would overwrite any changes you might have made.

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

I am assuming here that the lsr load is coming from the indexing component (modular inputs) - let us know if that is not the case

Unfortunately the polling frequency for HDFS based inputs is not exposed as a configuration variable. However you can easily modify it in the bin/hdfs.py file

276 def run():
277
278     config = get_config()
....
322
323             # check every 60 seconds for new entries
324             time.sleep(60)

splunkears
Path Finder

Hi,Thanks for the answer. Yes, your assumption is correct. Its coming from indexed HDFS input folder.

Regarding the fix, could you please suggest, what changes need to be made to introduce a sleep time variable PER indexed HDFS input folder?

Thanks again.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...