I'm having a folder with five files trying to get monitored. We have given the folder path , source type to Automatic and we have seen the data is indexed and is ready for Search . But while searching we observed that only one file got indexed and rest of the four files data is not present . Can someone please let us know what we missed so that rest four files data is not present ?
By default, Splunk uses the first 256 bytes (the head
) to determine if this is a file that it has seen before and the last 256 bytes (the tail
) to see if it has changed since the list time that it has seen it. This is stored in the fishbucket
(where a fisherman puts the heads and tails which he removes from the fish he catches). Your files have the same first 256 bytes so you need to increase the value of initCrcLength
to 256 bytes bigger than the length of the part of the beginning of the file that is constant. If your files are identical, you can use the crcSalt=<SOURCE>
feature to include the name of the file as a discriminating factor and any time that you rename the file (as long as it still matches your monitor
stanza), it will be reindexed. By default this is NOT the case.
See here:
https://docs.splunk.com/Documentation/Splunk/latest/Admin/Inputsconf
In partcular this:
crcSalt = <string>
* Use this setting to force the input to consume files that have matching CRCs
(cyclic redundancy checks).
* By default, the input only performs CRC checks against the first 256
bytes of a file. This behavior prevents the input from indexing the same
file twice, even though you might have renamed it, as with rolling log
files, for example. Because the CRC is based on only the first
few lines of the file, it is possible for legitimately different files
to have matching CRCs, particularly if they have identical headers.
* If set, <string> is added to the CRC.
* If set to the literal string "<SOURCE>" (including the angle brackets), the
full directory path to the source file is added to the CRC. This ensures
that each file being monitored has a unique CRC. When crcSalt is invoked,
it is usually set to <SOURCE>.
* Be cautious about using this setting with rolling log files; it could lead
to the log file being re-indexed after it has rolled.
* In many situations, initCrcLength can be used to achieve the same goals.
* Default: empty string.
initCrcLength = <integer>
* How much of a file, in bytes, that the input reads before trying to
identify whether it is a file that has already been seen. You might want to
adjust this if you have many files with common headers (comment headers,
long CSV headers, etc) and recurring filenames.
* Cannot be less than 256 or more than 1048576.
* CAUTION: Improper use of this setting will cause data to be re-indexed. You
might want to consult with Splunk Support before adjusting this value - the
default is fine for most installations.
* Default: 256 (bytes).
Could you include what the file paths look like, and your inputs.conf stanza for this directory?