Getting Data In

ignoreolderthan not working?

Communicator

Hi,

I have 2 stanza in inputs.conf:

[monitor:///data3/caa/caa7/]
whitelist=access.*gz
ignoreOlderThan=1d
disabled = false
followTail = 0

[monitor:///data3/radius/timpani]
whitelist=radius.log*gz
ignoreOlderThan=1d
sourcetype=radius_log
disabled = false
followTail = 0

When I do a "splunk list monitor", found that all files (access.*gz) are listed in "Monitored Directories":

   /data3/caa/caa7/
            /data3/caa/caa7/access.20120101-140525.gz
            /data3/caa/caa7/access.20120102-140525.gz
            /data3/caa/caa7/access.20120103-140525.gz
            /data3/caa/caa7/access.20120104-101251.gz
            /data3/caa/caa7/access.20120105-013825.gz
            /data3/caa/caa7/access.20120105-174037.gz
            /data3/caa/caa7/access.20120106-142004.gz

while filenames are not listed for the 2nd stanza and only the directory is shown:

   /data3/radius/timpani

I'm afraid if i've anything missed. Would anyone please help?

Thanks and rgds
/ST Wong

0 Karma
1 Solution

Motivator

You can't accurately determine if ignoreOlderThan is working by using the splunk list monitor command.

Your title question is "ignoreolderthan not working?". splunk list monitor will display all the dirs or files that are not in a blacklist= parameter. It will still show those files older than 1 day, because list doesn't make all of those calculations on the date/time stamps of each file.

By the way, your two stanzas have one important difference in their directory names: a trailing slash.

[monitor:///data3/caa/caa7/] <--has a slash

[monitor:///data3/radius/timpani] <--no slash

So, you will get different output from splunk list monitor for each of them.

View solution in original post

0 Karma

Motivator

You can't accurately determine if ignoreOlderThan is working by using the splunk list monitor command.

Your title question is "ignoreolderthan not working?". splunk list monitor will display all the dirs or files that are not in a blacklist= parameter. It will still show those files older than 1 day, because list doesn't make all of those calculations on the date/time stamps of each file.

By the way, your two stanzas have one important difference in their directory names: a trailing slash.

[monitor:///data3/caa/caa7/] <--has a slash

[monitor:///data3/radius/timpani] <--no slash

So, you will get different output from splunk list monitor for each of them.

View solution in original post

0 Karma

Communicator

Thanks for all of your advice. The universal forwarder is installed on log server that contains lot of archives.
We hope to index new ones only and thus tried using ignoreOldThan. Seeing some old log files are listed in the
monitored list and thus I wonder if ignoreOldThan is working or not. It's clarified.

Anyway, adding trail "/" to monitor stanza gives same result.

Thanks a lot.

0 Karma

Esteemed Legend

You have not explained what you are trying to accomplish by using ignoreOlderThan. You can use ingnoreOlderThan but if you do, beware that it does not work the way most people think that it does: once Splunk ignores the file the first time, it is in a blacklist and it will never be examined again, even if new data goes into it!

http://answers.splunk.com/answers/242194/missing-events-from-monitored-logs.html

Also read here, too:

http://answers.splunk.com/answers/57819/when-is-it-appropriate-to-set-followtail-to-true.html

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files in the real directory (/source/path/) for any file that has been touched in the last 5 minutes (or whatever your threshold is), like this:

 */5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -mmin -5 | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

The nice thing about this hack is that you can create a similar cron job to remove files that have not been changed in a while (because if you have too many files to sort through, even if they have no new data, your forwarder will slow WAY down) and if they ever do get touched, the first cron will add them back!
Don't forget to setup a 2nd cron to delete the softlinks, too, with whatever logic allows you to be sure that the file will never be used again, or you will end up with tens of thousands of files here, too.

If that approach does not suit you (again you have not revealed your goals), maybe you would like batch-sinkhole which causes splunk to delete the files after forwarding them. You should only use this for files that are copied into a directory, not for files that are written directly to line-by-line. Search for sinkhole in the link below:

http://docs.splunk.com/Documentation/Splunk/latest/admin/inputsconf

0 Karma