Archive

Recognizing Unicode

Communicator

Hi there, I am in the problem where I am receiving a JSON data via TCP but I am unable to convert the unicode to the correct one.

For example:

Search string: sourcetype = 123, results =

APPLICATION_NAME:  ABC
ADDRESS: %u0e1b%u0e32%u0e01%u0e41%u0e1e%u0e23%u0e01 

From what I understand, I should add under /etc/system/local/props.conf

[sourcetype::123]
CHARSET=TIS-620

With a command | extract reload=T, that should work.

Any idea? Heres the link to the unicode table if anyone is interested:

http://www.unicode.org/charts/PDF/U0E00.pdf

0 Karma
1 Solution

Communicator

Found a workaround by having a macro:

| eval ADDRESS= replace(ADDRESS, "u0e01","ก")
| eval ADDRESS= replace(ADDRESS, "u0e02","ก")

.... Repeat for all 50ish characters

Tedious but it works. I believe the problem is that the server is not forwarding me in the correct unicode format, hence requiring the manual work.

View solution in original post

0 Karma

Communicator

Found a workaround by having a macro:

| eval ADDRESS= replace(ADDRESS, "u0e01","ก")
| eval ADDRESS= replace(ADDRESS, "u0e02","ก")

.... Repeat for all 50ish characters

Tedious but it works. I believe the problem is that the server is not forwarding me in the correct unicode format, hence requiring the manual work.

View solution in original post

0 Karma

Builder

Try to use the charset ISO-IR-166, after change the value, reboot splunk service.

Regards,

0 Karma

Communicator

When I perform the change, will it take effect for indexed events or will that be for newer incoming events?

0 Karma

Builder

Only affects new events.

0 Karma

Communicator

Didn't work for us. Is the unicode supposed to be displayed as such with a percentage code in front?

0 Karma