So here's the deal; I've pulled down a week’s worth of logs in a hierarchically structured folder from our local server, where each log file is arranged like so:
I’ve passed this file tree into Splunk, giving it the <DirectoryPathGivenToSplunk> folder, and it indexed almost all of the files. The key word being almost; some files don’t register as being indexed. (i.e. their events aren't coming up in my searches.) I’ve double-checked that I have no blacklists or whitelists being enforced on upload, that the file is present, the correct type, not empty/null, not read-only or hidden, and that its contents are formatted properly; frankly, I’m stumped at what else may be causing the disconnect. Any ideas?
PS: I’m using Splunk Enterprise, (Trial…I think), Version 6.2.3 for Windows 7. Please ask if more information is required.
Edits: I've used the list monitor command, and double checked that the files whose logs are missing are indeed in the list of monitored files. In addition, The entire file structure is only ~ 100MB, a mere fifth of my daily indexing volume, and immediately after indexing the directory, almost all the logs appear in searches, so I'm rather doubtful that it's an issue with volume or speed. Even giving it 24+ hours to look for the missing files hasn't helped. And before you ask, the sizes of the missing files aren't significantly bigger or smaller than any of the others.
I would use a simpler file structure if given the chance, though I've been using the sources to contain information regarding the logs that aren't present in the logs themselves. It would be an option to upload the missing files individually, if it weren't for 2 issues:
Indexing is not instantaneous and if you have a big batch of fines on a forwarder, it is going to take a while for it to clear the backlog. What does this show on one of your lagging forwarders:
/opt/splunk/bin/splunk list monitor
It probably shows all the files you are expecting but the single Splunk instance has not been able to get through them all yet. How long have you given it?
Make sure your files start differently, and for example don't have a long similar header. This may cause Splunk to think it's already seen the file and not index it due to log rotation.
You could try to run:
splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus from your forwarder to see if the tailing processor is reading, has read, or is skipping them. Please let us know what you find.
It seems that Splunk is smarter than I am...
Upon closer inspection, the content of the "missing files" were just duplicates of the previous day's logs; as such, I don't believe
Splunk saw fit to index the same event twice. The problem was that my searches to make sure that all files were being monitored involved searching for the count of distinct sources; not for distinct events within the files themselves.
Takeaway lesson: If you seem to have files that aren't being indexed, double-check that the contents aren't duplicates of already existing logs.
Please add crcSalt parameters to the inputs.conf so that splunk treats the files separately even if the first 256 bytes is matching.
index = temp
*crcSalt = * < SOURCE >
ignoreOlderThan = 24h
Please remove spaces between brackets and SOURCE.