I have a sample log below. I tried to upload this data and it shows the following preview. Is it possible to display the log file correctly? This is a log file sent to me by someone else.
Basically, that white question mark in a black diamond tells you that the character is not unicode.
I suspect, given what the values represent, that they are probably binary numbers that don't happen to hit a valid code block. I'm not sure whether (or how) you can tell splunk to extract them... Hmmm.
There are two directions you can go. One is to identify the actual underlying bytes, in which case you are going to have to use a utility on the file that is capable of seeing whatever is there, and telling you the hex byte values. (How you accomplish this is going to depend on what kind of tech you are using.)
The other is to go the opposite direction, and find out what encoding was used to create the file, and what utilities they are using to transmit it wherever it is going on the road to get to you. Somewhere along the path, some "helpful" machine is translating the code from one type to another.
here's a suggestion from this page - http://www.webhostingtalk.com/showthread.php?t=622439
You're on the right track - It's a character-set issue. Get a tool that inspects the response headers of the server (like the Firebug extension if you're using Mozilla Firefox) to see what character set the server response is sending with the content. If the server's character-set and the HTML character set of the actual content don't match up, you will see some strange looking characters like those little black diamond squares.
Then again, there's a third method, which is to take the most likely English codings from this page -- http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/Data/Configurecharactersetencoding -- and try them each and see what happens. Since the rest of the logs are all in English, I would rule out all the non-English encodings.