I have a system configured with a separate search head, indexer and LWF. In order to validate our processing and refine some of the filtering of comment lines using REGEX, I want to be able to run a test suite multiple times using a test file deposited to a directory on the LWF that it monitors. The plan is to use the 'clean' command on the indexer in between runs, and then remove the test file and copy it again to the LWF to start the input/parse cycle again. However even if I rename the file splunk seems to figure out it's the same contents, and the events are not sent to the Indexer. Log messages from splunkd.log are similar to the following
01-29-2011 08:02:32.730 ERROR TailingProcessor - Ignoring path due to: File will not be read, is too small to match seekptr checksum (file=/home/cdndata/we_accesslog_extsqu_184.108.40.206_20110127_080000_02272_A.gz). Last time we saw this initcrc, filename was different. You may wish to use a CRC salt on this source. Consult the documentation or file a support case online at http://www.splunk.com/page/submit_issue for more info. 01-29-2011 08:02:34.729 INFO TailingProcessor - Archive file='/home/cdndata/we_accesslog_extsqu_220.127.116.11_20110126_200000_03855_A.gz' updated less than 10000ms ago, will not read it until it stops changing.
Is there a way to force the LWF to read the file and send the events, or are there any tricks we can use to make Splunk think it's a new file? We really need to keep the events constant as we proceed with refining the index, summarization and reporting parts of the system. Thanks!
Splunk keeps track of what it has already indexed by building crc sums of the beginning and the end of files if another file matches those sums even if its name is different it will not be reindexed
There is a parameter
crcSalt that you can add to your monitored file in inputs.conf to overcome this behaviour. If the parameter is set to the special value
<SOURCE> the full path to the file will be added to the crc sum and every time you rename the file it should get reindexed.
[monitor:///path/to/file/xyz] index = myIndex crcSalt = <SOURCE>