Splunk Cloud Platform

How to display of UTF-8 characters?

kpwaterson
Explorer

We are populating Splunk using an HEC connection with a source type of _json, set to the default character set of UTF-8.  However, a field shown in the raw data as:
"Character test: 0242 (\\u00f2): >\uC3B2<"
is displayed as:
Character test: 0242 (\u00f2): >쎲<
I would have expected the display to show the character, ò, which is the UTF-8 equivalent of hexadecimal C3B2, rather than the displayed UNICODE character

Labels (1)
0 Karma
1 Solution

acharlieh
Influencer

With JSON, with \u#### encoding the digits are the literal unicode code point (or the UTF-16 representation of the character.)

See: https://datatracker.ietf.org/doc/html/rfc8259#section-7

So, for example, a string
   containing only a single reverse solidus character may be represented
   as "\u005C"

 If it was UTF-8, that encoding wouldn't have the leading zeros

\uc3b2 is indeed Hangul Syllable Ssyeobs

The character you're looking for LATIN SMALL LETTER O WITH GRAVE is encoded in JSON correctly as \u00f2

View solution in original post

acharlieh
Influencer

With JSON, with \u#### encoding the digits are the literal unicode code point (or the UTF-16 representation of the character.)

See: https://datatracker.ietf.org/doc/html/rfc8259#section-7

So, for example, a string
   containing only a single reverse solidus character may be represented
   as "\u005C"

 If it was UTF-8, that encoding wouldn't have the leading zeros

\uc3b2 is indeed Hangul Syllable Ssyeobs

The character you're looking for LATIN SMALL LETTER O WITH GRAVE is encoded in JSON correctly as \u00f2

kpwaterson
Explorer

Thanks, this was a misread of the RFC on my part.  I appreciate the help.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Your sourcetype might be set to utf-8 but how is your source sending the data?

0 Karma

kpwaterson
Explorer

It's sending an HTTP format message to an HEC whose default source type is also set to _json.  Here is a dump of the request:

POST /services/collector/event?host=myhost&source=KEN-STUFF&sourcetype=_json&index=galaxy&channel=FE0ECFAD-13D5-401B-847D-77833BD77131 HTTP/1.1
Host: <target URL>
User-Agent: XYGATEMA
Connection: keep-alive
Content-Type: application/json
Authorization: Splunk <HEC token>
Content-Length: 1073

{"TIME":"2023-03-24 07:56:55.707","AUDIT": {"RECORDGMT":"2023-03-24:14:56:55.707636","GMTSEQNO":null,"RECORDLCT":"2023-03-24:07:56:55.707636","RECORDAUDITKEY":"","RECORDSESSIONKEY":"","SEQNO":null,"OUTCOME":4,"WARNINGMODE":"N","TESTMODE":"N","SEVERITY":"1","ALERTED":"A","PRODUCTCODE":"EMS","SUBJECT_USERNUMBER_MAJOR":null,"SUBJECT_USERNUMBER_MINOR":null,"TARGET_USERNUMBER_MAJOR":null,"TARGET_USERNUMBER_MINOR":null,"SUBJECTLOGIN":"","SUBJECTSYSTEM":"\\GALAXY","TARGETLOGIN":"","OBJECTTYPE":"COMFORTE.1.B00","OBJECTNAME":"","OPERATION":"EMS-EVENT","TERMINAL":"","MESSAGEID":2135,"MESSAGECODE":null,"RULENAME":"","USER_DATA":"REST alert","RESULT":"07:56 24MAR23 200,00,1268 Character test: 0242 (\\u00f2): >\uC3B2<"},"SESSION": {"RECORDSESSIONKEY":"","RECORDINSTALLKEY":"","SESSIONID":"\\GALAXY.$X98B:51790513","FOUNDSESSIONSTART":"N","FOUNDSESSIONEND":"N","SESSIONNAME":"","PROCESSTHREADID":"\\GALAXY.$X98B:51790513","PROCESSTHREADID2":"\\200.0,1268","CLIENTPROGRAM":"$Unknown.unknown.unknown","ANCESTORPROCESSTHREADID":"","IPADDRV46":"","DNSNAME":"","CLIENTCURRDIR":""}}

0 Karma

PickleRick
SplunkTrust
SplunkTrust

So you're sending \uC3B2. Not literal sequence of bytes \xC3\xB2

0 Karma

kpwaterson
Explorer

Yes, and that was indeed the issue.  Thanks

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...