I have created a sub folder on a windows splunk indexer in which each night a sub directory named for today's date with several files gets copied from a source for indexing & subsequent deleting. There is no whitelist/blacklist applied, everything in this folder goes into the same index, same source type, etc. Simple batch input. My issue is certain files are not being indexed and deleted. They tend to be the bigger ones. Is it possible the file isn't done writing when splunk tried to index it and it gives up or something? I can't fathom why some files work and some don't. The data does not get indexed for the files that are not deleted.
Please accept your own answer so this is marked as resolved.
I'd appreciate an ELI5 regarding the statement in the accepted answer: "batch input, even though destructive, still recognizes crcSalt of files that aren't there."
I've inherited a similar situation where batch input won't delete certain files. I suspected it has something to do with the initCrcLength parameter, but I'm having a hard time grasping how that setting works with batch.
[batch:///opt/sbox/data/Proxies/] move_policy = sinkhole disabled = false index = iron_proxy sourcetype = cisco:wsa:squid host_regex = \/opt\/sbox\/data\/Proxies\/(\w+) initCrcLength = 1048576
The crc check works the same way in batch files as in normal files (as far as I know).
Therefore the initCrcLength would be the number of bytes to read from the file to run the checksum on.
You could also use the props.conf to change your checksum method (CHECK_METHOD) or add a crcSalt if the file names are unique.