I have an ActiveBatch setup that generates many files (tens of thousands) in a folder. I'd like to have Splunk read only files freshly generated in these ActiveBatch folders. I am using the setting
followTail=1 for now, and it works OK. Is there a better way to do this?
It took splunk several hours of 100% CPU usage to go through a couple of such folders (with 30K files each). The files are generated once and are never modified after that (so "following their tail" is useless).
Is there a way to tell that to splunk? A setting similar to
followTail but that would tell it to:
There is a setting for ignoring old files:
ignoreOlderThan = <time window> * Causes the monitored input to stop checking files for updates if their modtime has passed this threshold. This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers of historical files (for example, when active log files are colocated with old files that are no longer being written to). * A file whose modtime falls outside this time window when seen for the first time will not be indexed at all. * Value must be: <number><unit> (e.g., 7d is one week). Valid units are d (days), m (minutes), and s (seconds). * Default: disabled.
Excellent! This seems to be quite suitable for this. Ignoring files older than 2 days will cover every situation in this case. Thanks!
I'm not getting 'ignoreOlderThan' to work?
disabled = false
index = [redacted]
blacklist = 201[0-9]-[0-1][0-8]
sourcetype = syslog
The directory is full of syslog files from rsyslog. When I do a 'splunk list monitor' its showing files that have dates back in 2017-12? (PS the blacklist was my attempt to stop if monitoring old files).
Like above OP, I have files created each day, but thousands of them. I dont want the UV to 'monitor' the files, but import any new ones. Once the files are created, they are never written too.