Splunk Dev

How to know what encoding are the strange characters in my logs are using?

wuming79
Path Finder

Hi,

I have a sample log below. I tried to upload this data and it shows the following preview. Is it possible to display the log file correctly? This is a log file sent to me by someone else.

alt text

Tags (1)
0 Karma

DalJeanis
SplunkTrust
SplunkTrust

Basically, that white question mark in a black diamond tells you that the character is not unicode.

https://en.wikipedia.org/wiki/Specials_(Unicode_block)

I suspect, given what the values represent, that they are probably binary numbers that don't happen to hit a valid code block. I'm not sure whether (or how) you can tell splunk to extract them... Hmmm.


There are two directions you can go. One is to identify the actual underlying bytes, in which case you are going to have to use a utility on the file that is capable of seeing whatever is there, and telling you the hex byte values. (How you accomplish this is going to depend on what kind of tech you are using.)

The other is to go the opposite direction, and find out what encoding was used to create the file, and what utilities they are using to transmit it wherever it is going on the road to get to you. Somewhere along the path, some "helpful" machine is translating the code from one type to another.

https://www.centos.org/forums/viewtopic.php?t=54437
http://www.cybervaldez.com/how-to-remove-those-nasty-question-mark-with-a-diamond-symbols-from-appea...

here's a suggestion from this page - http://www.webhostingtalk.com/showthread.php?t=622439

You're on the right track - It's a character-set issue. Get a tool that inspects the response headers of the server (like the Firebug extension if you're using Mozilla Firefox) to see what character set the server response is sending with the content. If the server's character-set and the HTML character set of the actual content don't match up, you will see some strange looking characters like those little black diamond squares.

Then again, there's a third method, which is to take the most likely English codings from this page -- http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/Data/Configurecharactersetencoding -- and try them each and see what happens. Since the rest of the logs are all in English, I would rule out all the non-English encodings.

0 Karma

ddrillic
Ultra Champion

Is it possible to paste the sequence of characters here?

0 Karma

wuming79
Path Finder

I'm sorry. What do you meant by sequence of character? Currently there is only 1 black Diamond with question mark inside in question.

0 Karma

DalJeanis
SplunkTrust
SplunkTrust

It's not absolutely certain how long a character is in unicode. That single black diamond might be 2-4 bytes long. (I'm betting it's a 4-byte binary integer.)

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...