Getting Data In

How to tell splunk to read log files only once, but keep monitoring the folder for new files?

Path Finder

I have an ActiveBatch setup that generates many files (tens of thousands) in a folder. I'd like to have Splunk read only files freshly generated in these ActiveBatch folders. I am using the setting followTail=1 for now, and it works OK. Is there a better way to do this?

It took splunk several hours of 100% CPU usage to go through a couple of such folders (with 30K files each). The files are generated once and are never modified after that (so "following their tail" is useless).

Is there a way to tell that to splunk? A setting similar to followTail but that would tell it to:

  • look only at new files in a folder (ignore any files that existed before the input was defined in splunk)
  • each file is created when corresponding job starts running, the file grows for some time (anywhere from 1 second to several hours, depending how long the corresponding job takes to complete)
  • once the corresponding job is finished the log file will never be modified again (no use tailing it anymore)
  • there are tens of thousands of such files, in several folders (it looks like tailing all those files is taking a serious toll on splunkd)
  • each of these files has a common section at the end, that can be used to determine that no more monitoring is necessary (you can see that common section this question)
Tags (2)
1 Solution

Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

View solution in original post

Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

View solution in original post

Path Finder

Excellent! This seems to be quite suitable for this. Ignoring files older than 2 days will cover every situation in this case. Thanks!

0 Karma

New Member

I'm not getting 'ignoreOlderThan' to work?

[monitor:///[redacted/]
disabled = false
index = [redacted]
ignoreOlderThan=3d
blacklist = 201[0-9]-[0-1][0-8]
sourcetype = syslog

The directory is full of syslog files from rsyslog. When I do a 'splunk list monitor' its showing files that have dates back in 2017-12? (PS the blacklist was my attempt to stop if monitoring old files).

Like above OP, I have files created each day, but thousands of them. I dont want the UV to 'monitor' the files, but import any new ones. Once the files are created, they are never written too.

0 Karma