I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}
(For the purposes of this question, please overlook the code
and time
In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the test
property values.
I was happy to see that it does:
But wait: where's the not sign?
I looked at the raw events in Splunk Web:
{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
file specifies KV_MODE = json
Splunk replaced the not sign in the original incoming JSON Lines with the character sequence \xAC
While AC
is the correct Unicode code point in hexadecimal for a not sign, \x
is not a valid escape sequence in JSON!
By introducing this escape sequence, Splunk has corrupted the JSON.
This looks like a bug to me.
I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.
My question(s)
| makeresults
| eval _raw=" {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"| (vertical bar): \u007c\"}
{\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"@ (commercial at): \u0040\"}
{\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"# (number sign, hash): \u0023\"}
{\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"¬ (not sign): \u00AC\"}"
| multikv noheader=t
| spath
| fields - _*
| table code time test
In Splunk version 8, this is fixed.
code time test
variant-characters 2019-10-15T10:00:00+08:00 | (vertical bar): |
variant-characters 2019-10-15T10:00:00+08:00 @ (commercial at): @
variant-characters 2019-10-15T10:00:00+08:00 # (number sign, hash): #
variant-characters 2019-10-15T10:00:00+08:00 ¬ (not sign): ¬