Splunk Cloud Platform

How to display of UTF-8 characters?

kpwaterson
Explorer

We are populating Splunk using an HEC connection with a source type of _json, set to the default character set of UTF-8.  However, a field shown in the raw data as:
"Character test: 0242 (\\u00f2): >\uC3B2<"
is displayed as:
Character test: 0242 (\u00f2): >쎲<
I would have expected the display to show the character, ò, which is the UTF-8 equivalent of hexadecimal C3B2, rather than the displayed UNICODE character

Labels (1)
0 Karma
1 Solution

acharlieh
Influencer

With JSON, with \u#### encoding the digits are the literal unicode code point (or the UTF-16 representation of the character.)

See: https://datatracker.ietf.org/doc/html/rfc8259#section-7

So, for example, a string
   containing only a single reverse solidus character may be represented
   as "\u005C"

 If it was UTF-8, that encoding wouldn't have the leading zeros

\uc3b2 is indeed Hangul Syllable Ssyeobs

The character you're looking for LATIN SMALL LETTER O WITH GRAVE is encoded in JSON correctly as \u00f2

View solution in original post

acharlieh
Influencer

With JSON, with \u#### encoding the digits are the literal unicode code point (or the UTF-16 representation of the character.)

See: https://datatracker.ietf.org/doc/html/rfc8259#section-7

So, for example, a string
   containing only a single reverse solidus character may be represented
   as "\u005C"

 If it was UTF-8, that encoding wouldn't have the leading zeros

\uc3b2 is indeed Hangul Syllable Ssyeobs

The character you're looking for LATIN SMALL LETTER O WITH GRAVE is encoded in JSON correctly as \u00f2

kpwaterson
Explorer

Thanks, this was a misread of the RFC on my part.  I appreciate the help.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Your sourcetype might be set to utf-8 but how is your source sending the data?

0 Karma

kpwaterson
Explorer

It's sending an HTTP format message to an HEC whose default source type is also set to _json.  Here is a dump of the request:

POST /services/collector/event?host=myhost&source=KEN-STUFF&sourcetype=_json&index=galaxy&channel=FE0ECFAD-13D5-401B-847D-77833BD77131 HTTP/1.1
Host: <target URL>
User-Agent: XYGATEMA
Connection: keep-alive
Content-Type: application/json
Authorization: Splunk <HEC token>
Content-Length: 1073

{"TIME":"2023-03-24 07:56:55.707","AUDIT": {"RECORDGMT":"2023-03-24:14:56:55.707636","GMTSEQNO":null,"RECORDLCT":"2023-03-24:07:56:55.707636","RECORDAUDITKEY":"","RECORDSESSIONKEY":"","SEQNO":null,"OUTCOME":4,"WARNINGMODE":"N","TESTMODE":"N","SEVERITY":"1","ALERTED":"A","PRODUCTCODE":"EMS","SUBJECT_USERNUMBER_MAJOR":null,"SUBJECT_USERNUMBER_MINOR":null,"TARGET_USERNUMBER_MAJOR":null,"TARGET_USERNUMBER_MINOR":null,"SUBJECTLOGIN":"","SUBJECTSYSTEM":"\\GALAXY","TARGETLOGIN":"","OBJECTTYPE":"COMFORTE.1.B00","OBJECTNAME":"","OPERATION":"EMS-EVENT","TERMINAL":"","MESSAGEID":2135,"MESSAGECODE":null,"RULENAME":"","USER_DATA":"REST alert","RESULT":"07:56 24MAR23 200,00,1268 Character test: 0242 (\\u00f2): >\uC3B2<"},"SESSION": {"RECORDSESSIONKEY":"","RECORDINSTALLKEY":"","SESSIONID":"\\GALAXY.$X98B:51790513","FOUNDSESSIONSTART":"N","FOUNDSESSIONEND":"N","SESSIONNAME":"","PROCESSTHREADID":"\\GALAXY.$X98B:51790513","PROCESSTHREADID2":"\\200.0,1268","CLIENTPROGRAM":"$Unknown.unknown.unknown","ANCESTORPROCESSTHREADID":"","IPADDRV46":"","DNSNAME":"","CLIENTCURRDIR":""}}

0 Karma

PickleRick
SplunkTrust
SplunkTrust

So you're sending \uC3B2. Not literal sequence of bytes \xC3\xB2

0 Karma

kpwaterson
Explorer

Yes, and that was indeed the issue.  Thanks

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...