Getting Data In

How to tell splunk to read log files only once, but keep monitoring the folder for new files?

zsimic
Path Finder

I have an ActiveBatch setup that generates many files (tens of thousands) in a folder. I'd like to have Splunk read only files freshly generated in these ActiveBatch folders. I am using the setting followTail=1 for now, and it works OK. Is there a better way to do this?

It took splunk several hours of 100% CPU usage to go through a couple of such folders (with 30K files each). The files are generated once and are never modified after that (so "following their tail" is useless).

Is there a way to tell that to splunk? A setting similar to followTail but that would tell it to:

  • look only at new files in a folder (ignore any files that existed before the input was defined in splunk)
  • each file is created when corresponding job starts running, the file grows for some time (anywhere from 1 second to several hours, depending how long the corresponding job takes to complete)
  • once the corresponding job is finished the log file will never be modified again (no use tailing it anymore)
  • there are tens of thousands of such files, in several folders (it looks like tailing all those files is taking a serious toll on splunkd)
  • each of these files has a common section at the end, that can be used to determine that no more monitoring is necessary (you can see that common section this question)
Tags (2)
1 Solution

Simeon
Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

View solution in original post

Simeon
Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

zsimic
Path Finder

Excellent! This seems to be quite suitable for this. Ignoring files older than 2 days will cover every situation in this case. Thanks!

0 Karma

dforstermg
New Member

I'm not getting 'ignoreOlderThan' to work?

[monitor:///[redacted/]
disabled = false
index = [redacted]
ignoreOlderThan=3d
blacklist = 201[0-9]-[0-1][0-8]
sourcetype = syslog

The directory is full of syslog files from rsyslog. When I do a 'splunk list monitor' its showing files that have dates back in 2017-12? (PS the blacklist was my attempt to stop if monitoring old files).

Like above OP, I have files created each day, but thousands of them. I dont want the UV to 'monitor' the files, but import any new ones. Once the files are created, they are never written too.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...