Getting Data In

ignoreolderthan in inputs.conf vs number of files in "splunk list monitor" performance

Path Finder

Hi All,

I'm trying to see if we can improve the performance of a Splunk instance and trying to optimize it - e.g. putting sourcetype instead of letting it being automatic/etc. There is a data input that's monitoring a directory, and there are about 20,000 files within said directory. I've added ignoreolderthan = 7d to the inputs.conf

The question is:
- Does adding ignoreolderthan in inputs.conf make Splunk ignore those files? Does that mean that I should be seeing less files being monitored in /splunk list monitor as well as the Data Inputs in the Splunk webpage?
- Or is the only way to minimize the number of files being monitored is to move those files OUT of the monitored directory?

Thank you.

0 Karma
1 Solution

Esteemed Legend

Using ignoreOlderThan will cause Splunk to totally ignore files forever: only the filename will be checked. HOWEVER, with 20k files (most of which you are "ignoring"), you still have to deal with the OS-level lag of accessing a list of files from a directory (calls to stat) that is too cluttered and the slowness of walking through that list when you know most of the files are permanently useless to you.

To avoid all of these problems, check out my (and other) answer here:
http://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html#ans...

View solution in original post

0 Karma

Esteemed Legend

Using ignoreOlderThan will cause Splunk to totally ignore files forever: only the filename will be checked. HOWEVER, with 20k files (most of which you are "ignoring"), you still have to deal with the OS-level lag of accessing a list of files from a directory (calls to stat) that is too cluttered and the slowness of walking through that list when you know most of the files are permanently useless to you.

To avoid all of these problems, check out my (and other) answer here:
http://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html#ans...

View solution in original post

0 Karma

Path Finder

Thanks for answering, what's odd though is that it seems Splunk isn't ignoring the files. For example, out of the 20,000 files, say 1,000 of them are the last 7 days.

If I set ignoreOlderThan = 7d and restart Splunk, the splunk list monitor output still shows all the 20,000 files, so it doesn't look like they're ignored at all.

0 Karma

Path Finder

OK, so I did a test and set up a test instance with a data input monitoring a directory with 206 files. The files inside range from May to September (yesterday).

In my inputs.conf, it's set to:

[monitor:///<dir>]
disabled = false
index = test_index
sourcetype = _json
ignoreOlderThan = 2d

There are only 13 files within the last two days, yet from the data inputs web view and even ./splunk list monitor, it shows the below.
alt text

Is this expected? It seems to be that Splunk is still monitoring the whole directory. With few files, it probably doesn't matter, but it'll definitely slow down over time as more files heap up (assuming one doesn't rotate them out)...

0 Karma

Esteemed Legend

I have not used btool to verify the function of ignoreOlderThan but your test surprises me. I would open a case with support.

0 Karma

Path Finder

Sorry for not updating this. After further testing, we were able to confirm. Splunk will monitor files already indexed even if ignoreOlderThan is set, unless the conf is set before the index takes place. If the ignoreOlderThan is set after files are indexed, only new files will conform to the ignoreOlderThan config.

0 Karma

Esteemed Legend

Which is pretty much what I was telling you (and why I pointed you to my other answer which is a good way around this whole mess). You can flip back and forth between ignoreOlderThan and not, by adding/removing the setting: no problem. It is no surprise to find that Splunk is still monitoring them to some degree because it has to mark them as inactive and store that state somehow/somewhere. The way to test if the ignoreOlderThan setting is working is to wait the desired amount of days with no change at which point Splunk will mark it to ignore FOREVER. Then send new events to the file and confirm that those new events are not forwarded, which is the intention of the setting (but not what most people expect).

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!