Getting Data In

Blacklist/ignore files by size


When monitoring a directory for files (using inputs.conf) is it possible to blacklist or ignore files over a certain size? Say for instance a few files get dropped in that are 100 MB in size or more. Splunk usually errors after processing these anyway. Can I ignore processing these larger files? Thanks in advance.

0 Karma

Esteemed Legend

I have used the following hack to solve this problem:

Create a new directory somewhere else (/destination/path/) and point the Splunk forwarder there. Then setup a cron job that creates selective soft links to files pointing to the real directory (/source/path/) for any file that meets your "keep" criteria, like this:

*/5 * * * * cd /source/file/path/ && /bin/find . -maxdepth 1 -type f -size -100M | /bin/sed "s/^..//" | /usr/bin/xargs -I {} /bin/ln -fs /source/path/{} /destination/path/{}

Don't forget to setup a 2nd cron to delete the broken softlinks (source files have been deleted), too, or you will end up with tens of thousands of files here, too.

0 Karma


Thank you for adding a work around sir. 🙂 I will give it a try, but will still leave this issue open until Splunk adds a supported solution such as a file size parameter in inputs.conf. Thanks again.

0 Karma

Revered Legend

I don't think there is a native way to ignore files based on their size. On the other hand, Splunk can monitor files much larger than 100 MBs so could you tell us more about "Splunk usually errors after processing these anyway'??

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!