Getting Data In

Splunk corrupts incoming JSON Lines by introducing bogus \x-prefix escape sequence?


I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:

{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}

(For the purposes of this question, please overlook the code and time properties.)

In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the test property values.

I was happy to see that it does:

alt text

But wait: where's the not sign?

I looked at the raw events in Splunk Web:

{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}


  • In case you're wondering, I use a transform to remove the code property.
  • My props.conf file specifies KV_MODE = json

Splunk replaced the not sign in the original incoming JSON Lines with the character sequence \xAC!

While AC is the correct Unicode code point in hexadecimal for a not sign, \x is not a valid escape sequence in JSON!

By introducing this escape sequence, Splunk has corrupted the JSON.

This looks like a bug to me.

I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.

My question(s)

  • Is this behavior a bug, as I suspect?
  • How many other characters are affected by this behavior?
0 Karma

Ultra Champion
| makeresults 
| eval _raw=" {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"| (vertical bar): \u007c\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"@ (commercial at): \u0040\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"# (number sign, hash): \u0023\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"¬ (not sign): \u00AC\"}" 
| multikv noheader=t 
| spath 
| fields - _*
| table code time test

In Splunk version 8, this is fixed.

code    time    test
variant-characters  2019-10-15T10:00:00+08:00   | (vertical bar): |
variant-characters  2019-10-15T10:00:00+08:00   @ (commercial at): @
variant-characters  2019-10-15T10:00:00+08:00   # (number sign, hash): #
variant-characters  2019-10-15T10:00:00+08:00   ¬ (not sign): ¬ 
0 Karma
Get Updates on the Splunk Community!

Admin Your Splunk Cloud, Your Way

Join us to maximize different techniques to best tune Splunk Cloud. In this Tech Enablement, you will get ...

Cloud Platform | Discontinuing support for TLS version 1.0 and 1.1

Overview Transport Layer Security (TLS) is a security communications protocol that lets two computers, ...

New Customer Testimonials

Enterprises of all sizes and across different industries are accelerating cloud adoption by migrating ...