I have sinkhole directory which eats pretty much anything what goes in, but there are bunch of log files which are not indexed nor deleted.
With vim I can see some special characters: ^@ at the end of first two fields and ^Z at the end of file. With :set list option in addition there are displayed only $ (eol) which is perfectly fine.
My question is what is this hidden character (^@) and can it prevent Splunk from indexing?
I've tried cat old.log > new.log, but this does not eliminates those characters, they are not displayed with cat though (unless -v is specified).
09-27-2011 20:39:05.104 +0200 ERROR TailingProcessor - File will not be read, is too small to match seekptr checksum (file=/logrepo/not_indexed.txt). Last time we saw this initcrc, filename was different. You may wish to use a CRC salt on this source. Consult the documentation or file a support case online at http://www.splunk.com/page/submit_issue for more info.
What is the error in the splunkd.log that says why it wasn't indexed?
thanks for replies, I've tried hexdump -C could not notice anything suspicious.
inputs.conf is really simple, just move policy set to sinkhole nothing else. I start to suspect maximum field length restriction. Here is only header, but it is sufficient to reproduce problem "not_indexed.txt" does not get indexed, but "indexed.txt" does. only difference is that indexed.txt has one less underscore character in the first field. http://dl.dropbox.com/u/8430959/indexed.txt http://dl.dropbox.com/u/8430959/not_indexed.txt
It sounds like you have a separate issue of the fact that logs are not being indexed. Supplying your inputs.conf settings as well as a log sample would be helpful in debugging your problem.
Not an answer on why Splunk isn't indexing your log, but you can check what ASCII value the character has by doing cat old.log | hexdump -C
.