Knowledge Management

Best Practice for Large Directory of Files

chrisboy68
Contributor

Hi,

I have some very large directorys. Here is my input.conf

[monitor://\\server\folder]
disabled = false
host = myhost
index = mylogs
sourcetype = mytasks
ignoreOlderThan = 2d
whitelist = (MYTasks\[EXPORT.*.log|MYTasks\[IMPORT.*.log)

When I check the number of files (Data inputs » Files & directories), I see 3840 files. Because of the Whitelist and modtime, most are ignored. Question, is this common? Seems inefficient for Splunk to monitor "skipped" files, I'm not sure if it has to re-read/touch these at every restart. I get a ton of files listed in "services/admin/inputstatus/TailingProcessor:FileStatus". If this is normal I'll move on, just trying to make sure my instance is performing optimally.

Thank you,

Chris

Tags (1)
0 Karma

woodcock
Esteemed Legend

If you cannot remove the files where they are, you are eventually going to have so many files for Splunk to dig through that the forwarder will be hopelessly slow in locating new events. The best way to handle it is to remove files that Splunk is not interested in or which Splunk has forwarded and will never update (check out the batch input type). If this cannot be done, then you can have Splunk monitor a different directory and you write a script and schedule it on a cron job so that it creates (and removed) soft links for the files that Splunk needs to watch.

0 Karma

chrisboy68
Contributor

Thank you for your reply. This made me dig more and I noticed that some directory's were being archived. Then with more reading I saw

recurse = [true|false]
* If true, recurse directories within the directory specified in [fschange].
* Defaults to true.

Hence, changing recurse=false, really reduced the i/o read time.

Thanks!

Chris

0 Karma

woodcock
Esteemed Legend

Yes, I would have suggested this if you had a said anything about subdirectories. It works very well.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Unlocking Unified Insights: New Gigamon Federated Search App for Splunk

In today’s data-heavy environment, organizations are caught in a data distribution dilemma. As data volumes ...

GA: New Data Management App in Splunk Platform

Streamlining Data Management: Introducing a unified experience in Splunk Managing data at scale shouldn’t feel ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...