I have sinkhole directory which eats pretty much anything what goes in, but there are bunch of log files which are not indexed nor deleted.
With vim I can see some special characters: ^@ at the end of first two fields and ^Z at the end of file. With :set list option in addition there are displayed only $ (eol) which is perfectly fine.
My question is what is this hidden character (^@) and can it prevent Splunk from indexing?
I've tried cat old.log > new.log, but this does not eliminates those characters, they are not displayed with cat though (unless -v is specified).
09-27-2011 20:39:05.104 +0200 ERROR TailingProcessor - File will not be read, is too small to match seekptr checksum (file=/logrepo/notindexed.txt). Last time we saw this initcrc, filename was different. You may wish to use a CRC salt on this source. Consult the documentation or file a support case online at http://www.splunk.com/page/submitissue for more info.
thanks for replies, I've tried hexdump -C could not notice anything suspicious.
inputs.conf is really simple, just move policy set to sinkhole nothing else. I start to suspect maximum field length restriction. Here is only header, but it is sufficient to reproduce problem "notindexed.txt" does not get indexed, but "indexed.txt" does. only difference is that indexed.txt has one less underscore character in the first field. http://dl.dropbox.com/u/8430959/indexed.txt http://dl.dropbox.com/u/8430959/notindexed.txt
It sounds like you have a separate issue of the fact that logs are not being indexed. Supplying your inputs.conf settings as well as a log sample would be helpful in debugging your problem.