<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do I search by unicode value? in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435964#M167683</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have the following example record:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; "msisdn":"xxxxxxxxx","Type":"\u0006","APN":"aaa","imsi":"xxxxxxxx","imei":"xxxxxxxxx","SGSN":null,"Remote IP Address":"xx.xx.xx.xx","TotalTimeInMS":0}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I can not search by Type, because it is a unicode value, and Splunk does not parse it correctly.&lt;/P&gt;

&lt;P&gt;The are 2 possible Type values: 1. "\u0006" 2. "\u0003".&lt;/P&gt;

&lt;P&gt;I am using the following splunk search:&lt;BR /&gt;
mysearch | spath input=anyparams | search Type="\u0006" &lt;/P&gt;

&lt;P&gt;The problem is that i receive no result, &lt;/P&gt;

&lt;P&gt;How should I use the search, when the field contains a unicode value?&lt;/P&gt;

&lt;P&gt;Thanks in advance,&lt;/P&gt;

&lt;P&gt;Yossi  &lt;/P&gt;</description>
    <pubDate>Thu, 30 Aug 2018 10:49:41 GMT</pubDate>
    <dc:creator>yyossef</dc:creator>
    <dc:date>2018-08-30T10:49:41Z</dc:date>
    <item>
      <title>How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435964#M167683</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I have the following example record:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; "msisdn":"xxxxxxxxx","Type":"\u0006","APN":"aaa","imsi":"xxxxxxxx","imei":"xxxxxxxxx","SGSN":null,"Remote IP Address":"xx.xx.xx.xx","TotalTimeInMS":0}
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;I can not search by Type, because it is a unicode value, and Splunk does not parse it correctly.&lt;/P&gt;

&lt;P&gt;The are 2 possible Type values: 1. "\u0006" 2. "\u0003".&lt;/P&gt;

&lt;P&gt;I am using the following splunk search:&lt;BR /&gt;
mysearch | spath input=anyparams | search Type="\u0006" &lt;/P&gt;

&lt;P&gt;The problem is that i receive no result, &lt;/P&gt;

&lt;P&gt;How should I use the search, when the field contains a unicode value?&lt;/P&gt;

&lt;P&gt;Thanks in advance,&lt;/P&gt;

&lt;P&gt;Yossi  &lt;/P&gt;</description>
      <pubDate>Thu, 30 Aug 2018 10:49:41 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435964#M167683</guid>
      <dc:creator>yyossef</dc:creator>
      <dc:date>2018-08-30T10:49:41Z</dc:date>
    </item>
    <item>
      <title>Re: How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435965#M167684</link>
      <description>&lt;P&gt;@yyossef, if you are searching unicode stored as text you would need to escape backslash by prefixing another backslash i.e. &lt;CODE&gt;"\\u0006"&lt;/CODE&gt; or &lt;CODE&gt;"\\u0003"&lt;/CODE&gt; in your SPL.&lt;/P&gt;

&lt;P&gt;Following is and example to use the same in search filter or eval function&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt; &amp;lt;yourCurrentSearch&amp;gt;
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Following is run anywhere search based on sample data provided:&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| extract pairdelim="," kvdelim=":"
| eval TypeDescription=case(Type=="\\u0006","ACKNOWLEDGE",Type=="\\u0004","END OF TEXT",true(),"Others")
| search Type="\\u0006" OR TypeDescription="ACKNOWLEDGE"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 30 Aug 2018 14:03:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435965#M167684</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2018-08-30T14:03:35Z</dc:date>
    </item>
    <item>
      <title>Re: How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435966#M167685</link>
      <description>&lt;P&gt;Hi @niketnilay,&lt;/P&gt;

&lt;P&gt;Thanks for your prompt response.&lt;BR /&gt;
Still no luck, the search result is empty.&lt;/P&gt;

&lt;P&gt;When using your 4 example, the result came back with only the deafault value "Other", meaning, no match was found.&lt;BR /&gt;
I am not sure that the unicode is stored as text, i think it is display as text by the system, but stored as unicode value.&lt;BR /&gt;
Do you have idea how to verify that? or how to search by unicode value?&lt;/P&gt;</description>
      <pubDate>Thu, 30 Aug 2018 14:35:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435966#M167685</guid>
      <dc:creator>yyossef</dc:creator>
      <dc:date>2018-08-30T14:35:42Z</dc:date>
    </item>
    <item>
      <title>Re: How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435967#M167686</link>
      <description>&lt;P&gt;@yyossef, I am not sure whether the Type field is actually being extracted or not... So first let us try a different approach. Following example does not try to extract Type field. Instead searched for unicode characters in &lt;CODE&gt;raw&lt;/CODE&gt; data.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults
| eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"
| eval TypeDescription=case(searchmatch("\\u0006"),"ACKNOWLEDGE",searchmatch("\\u0004"),"END OF TEXT",true(),"Others")
| search TypeDescription="ACKNOWLEDGE"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 30 Aug 2018 15:02:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435967#M167686</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2018-08-30T15:02:17Z</dc:date>
    </item>
    <item>
      <title>Re: How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435968#M167687</link>
      <description>&lt;P&gt;Looking at &lt;A href="https://www.fileformat.info/info/unicode/char/0006/index.htm"&gt;Unicode Character 'ACKNOWLEDGE' (U+0006)&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/5680i8402F60FF0F13A99/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;It  tells us that &lt;CODE&gt;\u0006&lt;/CODE&gt; is not a unicode/utf-8 character representation - it's the way several programming languages chose to represent it.&lt;/P&gt;</description>
      <pubDate>Fri, 31 Aug 2018 00:09:43 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435968#M167687</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2018-08-31T00:09:43Z</dc:date>
    </item>
    <item>
      <title>Re: How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435969#M167688</link>
      <description>&lt;P&gt;Hi @niketnilay,&lt;/P&gt;

&lt;P&gt;Yours other suggestion using searchmatch worked.&lt;/P&gt;

&lt;P&gt;| makeresults&lt;BR /&gt;
 | eval _raw="30/08/2018 13:30:27.996;VM1;ASH;AccessModule;processPacketBuffer;MSISDN;xxxxxxxxxxxx;;INFO;;;Return Access ; \"msisdn\":\"xxxxxxxxx\",\"Type\":\"\u0006\",\"APN\":\"aaa\",\"imsi\":\"xxxxxxxx\",\"imei\":\"xxxxxxxxx\",\"SGSN\":null,\"Remote IP Address\":\"xx.xx.xx.xx\",\"TotalTimeInMS\":0}"&lt;BR /&gt;
 | eval TypeDescription=case(&lt;STRONG&gt;searchmatch&lt;/STRONG&gt;("\u0006"),"ACKNOWLEDGE",&lt;STRONG&gt;searchmatch&lt;/STRONG&gt;("\u0004"),"END OF TEXT",true(),"Others")&lt;BR /&gt;
 | search TypeDescription="ACKNOWLEDGE"&lt;/P&gt;

&lt;P&gt;Why would searchmatch works while Type=="\u0006" did not? &lt;/P&gt;</description>
      <pubDate>Sun, 02 Sep 2018 06:05:46 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435969#M167688</guid>
      <dc:creator>yyossef</dc:creator>
      <dc:date>2018-09-02T06:05:46Z</dc:date>
    </item>
    <item>
      <title>Re: How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435970#M167689</link>
      <description>&lt;P&gt;@yyossef Type field is not getting auomatically extracted as part of Search Time field discovery. The &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/SearchReference/ConditionalFunctions#searchmatch.28X.29"&gt;searchmatch&lt;/A&gt; command finds the pattern match in the entire raw data. You would need to create your own &lt;A href="http://docs.splunk.com/Documentation/Splunk/latest/Knowledge/ExtractfieldsinteractivelywithIFX"&gt;Field Extraction&lt;/A&gt; to create a Type field based on Regular Expression.&lt;/P&gt;

&lt;P&gt;I am glad your issue is resolved. Do let us know if you need further help. Do up vote the answer/comments that helped! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Sep 2018 14:31:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435970#M167689</guid>
      <dc:creator>niketn</dc:creator>
      <dc:date>2018-09-02T14:31:23Z</dc:date>
    </item>
    <item>
      <title>Re: How do I search by unicode value?</title>
      <link>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435971#M167690</link>
      <description>&lt;P&gt;Just curious: why are unicode values not being cleansed/translated before the information gets sent to Stripe? As far as I know, data like this very rarely makes its way into Splunk, and much of what passes as weird UTF-8 codes do not make it into Splunk at all.&lt;/P&gt;

&lt;P&gt;Ian Quick shared this example code with us that shows how to test for uTF-8 characters and strip them out: &lt;A href="https://github.com/Shopify/shopify-tracing/commit/816ba2aef3c6ee8a232766028181b7b1ca03a2b1"&gt;https://github.com/Shopify/shopify-tracing/commit/816ba2aef3c6ee8a232766028181b7b1ca03a2b1&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I'd highly recommend cleansing your data before emitting to Stripe. Once the data is in Splunk, 99.9% of the UTF code will be lost and Splunk will not help you debug that issue. Cleansing your output before it hits Stripe is probably the best course of action.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jan 2020 19:47:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/How-do-I-search-by-unicode-value/m-p/435971#M167690</guid>
      <dc:creator>MichaelArsenaul</dc:creator>
      <dc:date>2020-01-21T19:47:56Z</dc:date>
    </item>
  </channel>
</rss>

