<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Splunk corrupts incoming JSON Lines by introducing bogus \x-prefix escape sequence? in Getting Data In</title>
    <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-corrupts-incoming-JSON-Lines-by-introducing-bogus-x/m-p/501902#M85525</link>
    <description>&lt;P&gt;I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;(For the purposes of this question, please overlook  the &lt;CODE&gt;code&lt;/CODE&gt; and &lt;CODE&gt;time&lt;/CODE&gt; properties.)&lt;/P&gt;

&lt;P&gt;In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the &lt;CODE&gt;test&lt;/CODE&gt; property values.&lt;/P&gt;

&lt;P&gt;I was happy to see that it does:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/7805i3BBF61E91D2A400D/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;But wait: where's the not sign?&lt;/P&gt;

&lt;P&gt;I looked at the raw events in Splunk Web:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Note:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;In case you're wondering, I use a transform to remove the &lt;CODE&gt;code&lt;/CODE&gt; property.&lt;/LI&gt;
&lt;LI&gt;My &lt;CODE&gt;props.conf&lt;/CODE&gt; file specifies &lt;CODE&gt;KV_MODE = json&lt;/CODE&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;&lt;STRONG&gt;Splunk replaced the not sign in the original incoming JSON Lines with the character sequence &lt;CODE&gt;\xAC&lt;/CODE&gt;!&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;While &lt;CODE&gt;AC&lt;/CODE&gt; is the correct Unicode code point in hexadecimal for a not sign, &lt;CODE&gt;\x&lt;/CODE&gt; is not a valid escape sequence in JSON!&lt;/P&gt;

&lt;P&gt;By introducing this escape sequence, Splunk has corrupted the JSON.&lt;/P&gt;

&lt;P&gt;This looks like a bug to me.&lt;/P&gt;

&lt;P&gt;I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;My question(s)&lt;/STRONG&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Is this behavior a bug, as I suspect?&lt;/LI&gt;
&lt;LI&gt;How many other characters are affected by this behavior?&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Tue, 15 Oct 2019 07:42:07 GMT</pubDate>
    <dc:creator>Graham_Hanningt</dc:creator>
    <dc:date>2019-10-15T07:42:07Z</dc:date>
    <item>
      <title>Splunk corrupts incoming JSON Lines by introducing bogus \x-prefix escape sequence?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-corrupts-incoming-JSON-Lines-by-introducing-bogus-x/m-p/501902#M85525</link>
      <description>&lt;P&gt;I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;(For the purposes of this question, please overlook  the &lt;CODE&gt;code&lt;/CODE&gt; and &lt;CODE&gt;time&lt;/CODE&gt; properties.)&lt;/P&gt;

&lt;P&gt;In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the &lt;CODE&gt;test&lt;/CODE&gt; property values.&lt;/P&gt;

&lt;P&gt;I was happy to see that it does:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/7805i3BBF61E91D2A400D/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;But wait: where's the not sign?&lt;/P&gt;

&lt;P&gt;I looked at the raw events in Splunk Web:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Note:&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;In case you're wondering, I use a transform to remove the &lt;CODE&gt;code&lt;/CODE&gt; property.&lt;/LI&gt;
&lt;LI&gt;My &lt;CODE&gt;props.conf&lt;/CODE&gt; file specifies &lt;CODE&gt;KV_MODE = json&lt;/CODE&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;&lt;STRONG&gt;Splunk replaced the not sign in the original incoming JSON Lines with the character sequence &lt;CODE&gt;\xAC&lt;/CODE&gt;!&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;While &lt;CODE&gt;AC&lt;/CODE&gt; is the correct Unicode code point in hexadecimal for a not sign, &lt;CODE&gt;\x&lt;/CODE&gt; is not a valid escape sequence in JSON!&lt;/P&gt;

&lt;P&gt;By introducing this escape sequence, Splunk has corrupted the JSON.&lt;/P&gt;

&lt;P&gt;This looks like a bug to me.&lt;/P&gt;

&lt;P&gt;I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;My question(s)&lt;/STRONG&gt;&lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;Is this behavior a bug, as I suspect?&lt;/LI&gt;
&lt;LI&gt;How many other characters are affected by this behavior?&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Tue, 15 Oct 2019 07:42:07 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-corrupts-incoming-JSON-Lines-by-introducing-bogus-x/m-p/501902#M85525</guid>
      <dc:creator>Graham_Hanningt</dc:creator>
      <dc:date>2019-10-15T07:42:07Z</dc:date>
    </item>
    <item>
      <title>Re: Splunk corrupts incoming JSON Lines by introducing bogus \x-prefix escape sequence?</title>
      <link>https://community.splunk.com/t5/Getting-Data-In/Splunk-corrupts-incoming-JSON-Lines-by-introducing-bogus-x/m-p/501903#M85526</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;| makeresults 
| eval _raw=" {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"| (vertical bar): \u007c\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"@ (commercial at): \u0040\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"# (number sign, hash): \u0023\"}
 {\"code\":\"variant-characters\",\"time\":\"2019-10-15T10:00:00+08:00\",\"test\":\"¬ (not sign): \u00AC\"}" 
| multikv noheader=t 
| spath 
| fields - _*
| table code time test
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;In Splunk version 8, this is fixed.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;code    time    test
variant-characters  2019-10-15T10:00:00+08:00   | (vertical bar): |
variant-characters  2019-10-15T10:00:00+08:00   @ (commercial at): @
variant-characters  2019-10-15T10:00:00+08:00   # (number sign, hash): #
variant-characters  2019-10-15T10:00:00+08:00   ¬ (not sign): ¬ 
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 02 May 2020 04:08:13 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Getting-Data-In/Splunk-corrupts-incoming-JSON-Lines-by-introducing-bogus-x/m-p/501903#M85526</guid>
      <dc:creator>to4kawa</dc:creator>
      <dc:date>2020-05-02T04:08:13Z</dc:date>
    </item>
  </channel>
</rss>

