Getting Data In

UniForwarder - only new log files with date mask

frusso
Engager

I am in the process of implementing Splunk in a fairly long-lived environment.  Log directories contain date-masked log files.  I would like to ignore files before today's date, and only import new files. 

Example: 

/opt/someApplication/logs/someApplication.202412160600.out

I am unable to wildcard /opt/someApplication/logs/someApplication.*.out as there are logs dating back to 2017 and I'd exceed our daily license/quota by several orders of magnitude.  Changing the logging format is not an option.  Exclude-lists appear to be a solution, but even using regex would be incredibly burdensome.

Thoughts?

 

 

Labels (1)
0 Karma
1 Solution

isoutamo
SplunkTrust
SplunkTrust
Definitely you should move old logs into some other archive directory on source side. Depending on OS and its version, your current situation could be a big bottleneck soon or even it could be it already. I have seen environments where even ls or dir didn’t work due to number of files.

IgnoreOlderThan is what you should/could try, BUT you must remember that it’s looking for file modification time. If someone somehow update mtime of file then splunk read it no matter of whenever has really modified.

ignoreOlderThan = <non-negative integer>[s|m|h|d]
* The monitor input compares the modification time on files it encounters
with the current time. If the time elapsed since the modification time
is greater than the value in this setting, Splunk software puts the file
on the ignore list.
* Files on the ignore list are not checked again until the Splunk
platform restarts, or the file monitoring subsystem is reconfigured. This
is true even if the file becomes newer again at a later time.
* Reconfigurations occur when changes are made to monitor or batch
inputs through Splunk Web or the command line.
* Use 'ignoreOlderThan' to increase file monitoring performance when
monitoring a directory hierarchy that contains many older, unchanging
files, and when removing or adding a file to the deny list from the
monitoring location is not a reasonable option.
* Do NOT select a time that files you want to read could reach in
age, even temporarily. Take potential downtime into consideration!
* Suggested value: 14d, which means 2 weeks
* For example, a time window in significant numbers of days or small
numbers of weeks are probably reasonable choices.
* If you need a time window in small numbers of days or hours,
there are other approaches to consider for performant monitoring
beyond the scope of this setting.
* NOTE: Most modern Windows file access APIs do not update file
modification time while the file is open and being actively written to.
Windows delays updating modification time until the file is closed.
Therefore you might have to choose a larger time window on Windows
hosts where files may be open for long time periods.
* Value must be: <number><unit>. For example, "7d" indicates one week.
* Valid units are "d" (days), "h" (hours), "m" (minutes), and "s"
(seconds).
* No default, meaning there is no threshold and no files are
ignored for modification time reasons

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

You can use ignoreOlderThan option but be aware that the forwarder still has to keep track of file existence and metadata so if you have many files in the source directory you might need to raise process limits and you'll be wasting your resources on files you don't care about.

isoutamo
SplunkTrust
SplunkTrust
Definitely you should move old logs into some other archive directory on source side. Depending on OS and its version, your current situation could be a big bottleneck soon or even it could be it already. I have seen environments where even ls or dir didn’t work due to number of files.

IgnoreOlderThan is what you should/could try, BUT you must remember that it’s looking for file modification time. If someone somehow update mtime of file then splunk read it no matter of whenever has really modified.

ignoreOlderThan = <non-negative integer>[s|m|h|d]
* The monitor input compares the modification time on files it encounters
with the current time. If the time elapsed since the modification time
is greater than the value in this setting, Splunk software puts the file
on the ignore list.
* Files on the ignore list are not checked again until the Splunk
platform restarts, or the file monitoring subsystem is reconfigured. This
is true even if the file becomes newer again at a later time.
* Reconfigurations occur when changes are made to monitor or batch
inputs through Splunk Web or the command line.
* Use 'ignoreOlderThan' to increase file monitoring performance when
monitoring a directory hierarchy that contains many older, unchanging
files, and when removing or adding a file to the deny list from the
monitoring location is not a reasonable option.
* Do NOT select a time that files you want to read could reach in
age, even temporarily. Take potential downtime into consideration!
* Suggested value: 14d, which means 2 weeks
* For example, a time window in significant numbers of days or small
numbers of weeks are probably reasonable choices.
* If you need a time window in small numbers of days or hours,
there are other approaches to consider for performant monitoring
beyond the scope of this setting.
* NOTE: Most modern Windows file access APIs do not update file
modification time while the file is open and being actively written to.
Windows delays updating modification time until the file is closed.
Therefore you might have to choose a larger time window on Windows
hosts where files may be open for long time periods.
* Value must be: <number><unit>. For example, "7d" indicates one week.
* Valid units are "d" (days), "h" (hours), "m" (minutes), and "s"
(seconds).
* No default, meaning there is no threshold and no files are
ignored for modification time reasons
Get Updates on the Splunk Community!

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

Splunk Enterprise Security 8.x: The Essential Upgrade for Threat Detection, ...

Watch On Demand the Tech Talk on November 6 at 11AM PT, and empower your SOC to reach new heights! Duration: ...

Splunk Observability as Code: From Zero to Dashboard

For the details on what Self-Service Observability and Observability as Code is, we have some awesome content ...