Getting Data In

Splunk corrupts incoming JSON Lines by introducing bogus \x-prefix escape sequence?


I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:

{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}

(For the purposes of this question, please overlook the code and time properties.)

In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the test property values.

I was happy to see that it does:

alt text

But wait: where's the not sign?

I looked at the raw events in Splunk Web:

{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}


  • In case you're wondering, I use a transform to remove the code property.
  • My props.conf file specifies KV_MODE = json

Splunk replaced the not sign in the original incoming JSON Lines with the character sequence \xAC!

While AC is the correct Unicode code point in hexadecimal for a not sign, \x is not a valid escape sequence in JSON!

By introducing this escape sequence, Splunk has corrupted the JSON.

This looks like a bug to me.

I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.

My question(s)

  • Is this behavior a bug, as I suspect?
  • How many other characters are affected by this behavior?
0 Karma

Ultra Champion
| makeresults 
| eval _raw=" {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"| (vertical bar): \u007c\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"@ (commercial at): \u0040\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"# (number sign, hash): \u0023\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"¬ (not sign): \u00AC\"}" 
| multikv noheader=t 
| spath 
| fields - _*
| table code time test

In Splunk version 8, this is fixed.

code    time    test
variant-characters  2019-10-15T10:00:00+08:00   | (vertical bar): |
variant-characters  2019-10-15T10:00:00+08:00   @ (commercial at): @
variant-characters  2019-10-15T10:00:00+08:00   # (number sign, hash): #
variant-characters  2019-10-15T10:00:00+08:00   ¬ (not sign): ¬ 
0 Karma
Get Updates on the Splunk Community!

Thanks for the Memories! Splunk University, .conf24, and Community Connections

Thank you to everyone in the Splunk Community who joined us for .conf24 – starting with Splunk University and ...

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...