I have a Windows .ini file that I am wanting to index on every update of the file. Right now when the file is updated it is not being re-indexed. The file doesn't have much data in it ... just about 1K worth of data. Whenever the file is updated not much of the file is changed ... mostly just a couple values referencing the build # for the application it goes with. Ideally, I would like the whole file to be re-indexed every time any change is made to the file. Anyone tried this or have thoughts on it. I guess if all else fails I could do a scripted input on a schedule and do it that way, but that would mean I would not get the updates right away and I would also get lots of useless data since most of the scheduled polls would have no change.
Hi @fredclown
By default, Splunk only CRCs (cyclic redundancy checks) the first 256 bytes of a file.
* By default, the input only performs CRC checks against the first 256 bytes of a file. This behavior prevents the input from indexing the same file twice, even though you might have renamed it, as with rolling log files, for example. Because the CRC is based on only the first few lines of the file, it is possible for legitimately different files to have matching CRCs, particularly if they have identical headers.
It's likely the first 256 bytes of the .ini file never changes, meaning the CRC value never changes, so the Splunk monitor never detects that it's been updated.
To increase from the default value use the following parameter on the UFs input.conf file.
https://docs.splunk.com/Documentation/Splunk/8.2.6/Admin/Inputsconf#MONITOR:
initCrcLength = <integer> * How much of a file, in bytes, that the input reads before trying to identify whether it is a file that has already been seen. You might want to adjust this if you have many files with common headers (comment headers, long CSV headers, etc) and recurring filenames. * Cannot be less than 256 or more than 1048576. * CAUTION: Improper use of this setting causes data to be re-indexed. You might want to consult with Splunk Support before adjusting this value - the default is fine for most installations. * Default: 256 (bytes)
Set the bytes value to something bigger than the .ini file will ever be.
Hope this helps.
Hi
Maybe one option for you is to use only mod time for file not check the content as you want index it always it has changed? Or entire_md5 which check has MD5 of file changed.
CHECK_METHOD = [endpoint_md5|entire_md5|modtime]
* Set CHECK_METHOD to "endpoint_md5" to have Splunk software perform a checksum
of the first and last 256 bytes of a file. When it finds matches, Splunk
software lists the file as already indexed and indexes only new data, or
ignores it if there is no new data.
* Set CHECK_METHOD to "entire_md5" to use the checksum of the entire file.
* Set CHECK_METHOD to "modtime" to check only the modification time of the
file.
* Settings other than "endpoint_md5" cause Splunk software to index the entire
file for each detected change.
* This option is only valid for [source::<source>] stanzas.
* This setting applies at input time, when data is first read by Splunk
software, such as on a forwarder that has configured inputs acquiring the
data.
* Default: endpoint_md5
r. Ismo
There's a lot of troubleshoot there. The following documentation sounds like what you're looking to do.
https://docs.splunk.com/Documentation/Splunk/8.2.6/Data/Monitorfilesanddirectories