Splunk Dev

How to know what encoding are the strange characters in my logs are using?

wuming79
Path Finder

Hi,

I have a sample log below. I tried to upload this data and it shows the following preview. Is it possible to display the log file correctly? This is a log file sent to me by someone else.

alt text

Tags (1)
0 Karma

DalJeanis
Legend

Basically, that white question mark in a black diamond tells you that the character is not unicode.

https://en.wikipedia.org/wiki/Specials_(Unicode_block)

I suspect, given what the values represent, that they are probably binary numbers that don't happen to hit a valid code block. I'm not sure whether (or how) you can tell splunk to extract them... Hmmm.


There are two directions you can go. One is to identify the actual underlying bytes, in which case you are going to have to use a utility on the file that is capable of seeing whatever is there, and telling you the hex byte values. (How you accomplish this is going to depend on what kind of tech you are using.)

The other is to go the opposite direction, and find out what encoding was used to create the file, and what utilities they are using to transmit it wherever it is going on the road to get to you. Somewhere along the path, some "helpful" machine is translating the code from one type to another.

https://www.centos.org/forums/viewtopic.php?t=54437
http://www.cybervaldez.com/how-to-remove-those-nasty-question-mark-with-a-diamond-symbols-from-appea...

here's a suggestion from this page - http://www.webhostingtalk.com/showthread.php?t=622439

You're on the right track - It's a character-set issue. Get a tool that inspects the response headers of the server (like the Firebug extension if you're using Mozilla Firefox) to see what character set the server response is sending with the content. If the server's character-set and the HTML character set of the actual content don't match up, you will see some strange looking characters like those little black diamond squares.

Then again, there's a third method, which is to take the most likely English codings from this page -- http://docs.splunk.com/Documentation/SplunkCloud/6.6.0/Data/Configurecharactersetencoding -- and try them each and see what happens. Since the rest of the logs are all in English, I would rule out all the non-English encodings.

0 Karma

ddrillic
Ultra Champion

Is it possible to paste the sequence of characters here?

0 Karma

wuming79
Path Finder

I'm sorry. What do you meant by sequence of character? Currently there is only 1 black Diamond with question mark inside in question.

0 Karma

DalJeanis
Legend

It's not absolutely certain how long a character is in unicode. That single black diamond might be 2-4 bytes long. (I'm betting it's a 4-byte binary integer.)

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...