Hi All,
I'm trying to see if we can improve the performance of a Splunk instance and trying to optimize it - e.g. putting sourcetype instead of letting it being automatic/etc. There is a data input that's monitoring a directory, and there are about 20,000 files within said directory. I've added ignoreolderthan = 7d to the inputs.conf
The question is:
- Does adding ignoreolderthan in inputs.conf make Splunk ignore those files? Does that mean that I should be seeing less files being monitored in /splunk list monitor as well as the Data Inputs in the Splunk webpage?
- Or is the only way to minimize the number of files being monitored is to move those files OUT of the monitored directory?
Thank you.
Using ignoreOlderThan
will cause Splunk to totally ignore files forever: only the filename will be checked. HOWEVER, with 20k files (most of which you are "ignoring"), you still have to deal with the OS-level lag of accessing a list of files from a directory (calls to stat
) that is too cluttered and the slowness of walking through that list when you know most of the files are permanently useless to you.
To avoid all of these problems, check out my (and other) answer here:
http://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html#ans...
Using ignoreOlderThan
will cause Splunk to totally ignore files forever: only the filename will be checked. HOWEVER, with 20k files (most of which you are "ignoring"), you still have to deal with the OS-level lag of accessing a list of files from a directory (calls to stat
) that is too cluttered and the slowness of walking through that list when you know most of the files are permanently useless to you.
To avoid all of these problems, check out my (and other) answer here:
http://answers.splunk.com/answers/309910/how-to-monitor-a-folder-for-newest-files-only-file.html#ans...
Thanks for answering, what's odd though is that it seems Splunk isn't ignoring the files. For example, out of the 20,000 files, say 1,000 of them are the last 7 days.
If I set ignoreOlderThan = 7d and restart Splunk, the splunk list monitor output still shows all the 20,000 files, so it doesn't look like they're ignored at all.
OK, so I did a test and set up a test instance with a data input monitoring a directory with 206 files. The files inside range from May to September (yesterday).
In my inputs.conf, it's set to:
[monitor:///<dir>]
disabled = false
index = test_index
sourcetype = _json
ignoreOlderThan = 2d
There are only 13 files within the last two days, yet from the data inputs web view and even ./splunk list monitor, it shows the below.
Is this expected? It seems to be that Splunk is still monitoring the whole directory. With few files, it probably doesn't matter, but it'll definitely slow down over time as more files heap up (assuming one doesn't rotate them out)...
I have not used btool
to verify the function of ignoreOlderThan
but your test surprises me. I would open a case with support.
Sorry for not updating this. After further testing, we were able to confirm. Splunk will monitor files already indexed even if ignoreOlderThan is set, unless the conf is set before the index takes place. If the ignoreOlderThan is set after files are indexed, only new files will conform to the ignoreOlderThan config.
Which is pretty much what I was telling you (and why I pointed you to my other answer which is a good way around this whole mess). You can flip back and forth between ignoreOlderThan
and not, by adding/removing the setting: no problem. It is no surprise to find that Splunk is still monitoring them to some degree because it has to mark them as inactive and store that state somehow/somewhere. The way to test if the ignoreOlderThan
setting is working is to wait the desired amount of days with no change at which point Splunk will mark it to ignore FOREVER. Then send new events to the file and confirm that those new events are not forwarded, which is the intention of the setting (but not what most people expect).