I'm attempting to setup Splunk monitoring for a Windows directory on a shared drive server.
At the moment, most files are detected and indexed by Splunk but extremely small files (Around 2KB - 20KB) are ignored.
I have tried using "crcSalt = <SOURCE>" so I don't believe that the files are being registered as duplicates. Is there a minimum file size requirement by Splunk for files to be indexed?
It's probably not due to the size of the files themselves but their contents (so that "headers" of those files repeat and Splunk doesn't treat them as unique).
Also be aware that ingesting (many small) files over a CIFS share is extremely inefficient way to do it. It's better to have some kind of a batch process monitoring the contents of remote directory and copying the files over locally to be ingested than keeping the remote share monitored. Been there, done that, managed to convince the customer after some two years. Performance skyrocketed afterward.
Hi @jayv
There is no minimum file size requirement in Splunk for indexing files. Splunk can index files as small as a few bytes.
Please could you run the following and share the output for one of the monitor stanzas of a file not being ingested?
$SPLUNK_HOME/bin/splunk btool inputs list --debug
Can you also confirm that the Splunk user has permissions to read these files? It might be worth searching the _internal index for one of the missing filenames to see if there are any other errors or permission issues relating to the file which might also pinpoint the issue.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Splunk does not have a minimum file size requirement for indexing files, so files in the 2KB–20KB range should be ingested-unless something in your configuration or environment is causing them to be skipped.
How you are reading these files? With UF or full Splunk?
Ensure files actually have new content or new lines.
To test, Manually drop a small test file into the monitored directory and watch _internal logs and see what it says
index=_internal source=*metrics.log OR source=*splunkd.log "*FILENAME*"
Regards,
Prewin
If this answer helped you, please consider marking it as the solution or giving a Karma. Thanks!