Getting Data In

Splunk corrupts incoming JSON Lines by introducing bogus \x-prefix escape sequence?

Graham_Hanningt
Builder

I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:

{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}

(For the purposes of this question, please overlook the code and time properties.)

In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the test property values.

I was happy to see that it does:

alt text

But wait: where's the not sign?

I looked at the raw events in Splunk Web:

{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}

Note:

  • In case you're wondering, I use a transform to remove the code property.
  • My props.conf file specifies KV_MODE = json

Splunk replaced the not sign in the original incoming JSON Lines with the character sequence \xAC!

While AC is the correct Unicode code point in hexadecimal for a not sign, \x is not a valid escape sequence in JSON!

By introducing this escape sequence, Splunk has corrupted the JSON.

This looks like a bug to me.

I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.

My question(s)

  • Is this behavior a bug, as I suspect?
  • How many other characters are affected by this behavior?
0 Karma

to4kawa
Ultra Champion
| makeresults 
| eval _raw=" {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"| (vertical bar): \u007c\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"@ (commercial at): \u0040\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"# (number sign, hash): \u0023\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"¬ (not sign): \u00AC\"}" 
| multikv noheader=t 
| spath 
| fields - _*
| table code time test

In Splunk version 8, this is fixed.

code    time    test
variant-characters  2019-10-15T10:00:00+08:00   | (vertical bar): |
variant-characters  2019-10-15T10:00:00+08:00   @ (commercial at): @
variant-characters  2019-10-15T10:00:00+08:00   # (number sign, hash): #
variant-characters  2019-10-15T10:00:00+08:00   ¬ (not sign): ¬ 
0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...