<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Clarification needed on eval split() function in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507876#M141963</link>
    <description>&lt;P&gt;I don't think you can match on multiple character emoji.&amp;nbsp;Separating by UTF8 byte (split) or by Unicode character (rex), Splunk only has to look at whether the codepoint is valid.&lt;/P&gt;&lt;P&gt;There are entire projects out there that build the regex based on the current Unicode definition. It is possible that you could create an app that would periodically update.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/mathiasbynens/emoji-regex" target="_blank"&gt;https://github.com/mathiasbynens/emoji-regex&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You could recommend it at&amp;nbsp;&lt;A href="https://ideas.splunk.com/" target="_blank"&gt;https://ideas.splunk.com/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 07 Jul 2020 16:02:06 GMT</pubDate>
    <dc:creator>malvidin</dc:creator>
    <dc:date>2020-07-07T16:02:06Z</dc:date>
    <item>
      <title>Clarification needed on eval split() function</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507081#M141822</link>
      <description>&lt;P&gt;For the following search command, what is the expected output?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;| makeresults
| eval text_string = "I:red_heart:Splunk"
| eval text_split = split(text_string, "")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I would expect a text_split field that either contains an array like this:&lt;/P&gt;&lt;P class="lia-align-left lia-indent-padding-left-30px"&gt;&lt;FONT face="courier new,courier"&gt;text_split == [ 'I', '&lt;span class="lia-unicode-emoji" title=":red_heart:"&gt;❤️&lt;/span&gt;', 'S', 'p', 'l', 'u', 'n', 'k' ]&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;or if&amp;nbsp; split by byte, potentially dependent on the locale:&lt;/P&gt;&lt;P class="lia-align-left lia-indent-padding-left-30px"&gt;&lt;FONT face="courier new,courier"&gt;text_split == [ 'I', 'â', '&amp;#157;', '¤', 'ï', '¸', '¿', 'S', 'p', 'l', 'u', 'n', 'k' ]&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;But not the current output, were the data :&lt;/P&gt;&lt;P class="lia-indent-padding-left-30px"&gt;&lt;FONT face="courier new,courier"&gt;text_split == [ 'I', '&lt;SPAN&gt;�&lt;/SPAN&gt;', '&lt;SPAN&gt;�&lt;/SPAN&gt;', '&lt;SPAN&gt;�&lt;/SPAN&gt;', '&lt;SPAN&gt;�&lt;/SPAN&gt;', '&lt;SPAN&gt;�&lt;/SPAN&gt;', '&lt;SPAN&gt;�&lt;/SPAN&gt;', 'S', 'p', 'l', 'u', 'n', 'k' ]&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;The use of characters that aren't fixed width also screws up search entry highlighting and text selection, but that isn't related to the split function.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;| eval text_string = "I:red_heart:Splunk"  `comment("Try highlighting a word in this comment in the SPL Editor")`&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;It looks like mvjoin() reverses the split(), but mvcombine fails.&lt;BR /&gt;&lt;BR /&gt;&lt;EM&gt;(edit attempt failed to add the red heart back to the code samples; replaced with :red_heart:)&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jul 2020 07:39:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507081#M141822</guid>
      <dc:creator>malvidin</dc:creator>
      <dc:date>2020-07-03T07:39:26Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification needed on eval split() function</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507159#M141835</link>
      <description>&lt;P&gt;Interesting find - not surprising that split does not work with certain Unicode code points correctly, I imagine that's a fairly rare edge case when dealing with Splunked data&amp;nbsp;&lt;span class="lia-unicode-emoji" title=":red_heart:"&gt;❤️&lt;/span&gt;&lt;/P&gt;&lt;P&gt;I guess both the split handling and the editor are bugs, as&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| eval t=text_string
| eval tl=len(t)
| rex field=t mode=sed "s/&lt;span class="lia-unicode-emoji" title=":red_heart:"&gt;❤️&lt;/span&gt;/_LuuuV_/"&lt;/LI-CODE&gt;&lt;P&gt;both the length of 9 is correctly counting the two Unicode code points and rex replaces it correctly (less surprising).&lt;/P&gt;&lt;P&gt;You might expect that split() should give the two Unicode code points as separate split_text values, the first with the black heart and the second with some other (unknown) character, but the fact that it's converting it to 6 values, indicates it's misinterpreting the UTF8.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jul 2020 23:13:56 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507159#M141835</guid>
      <dc:creator>bowesmana</dc:creator>
      <dc:date>2020-07-02T23:13:56Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification needed on eval split() function</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507212#M141840</link>
      <description>&lt;P&gt;Because mvjoin() reverses the operation, the back end data does not appear to be lost. And since it is split into 6 characters, it appears that the back end data is being parsed as UTF8.&lt;/P&gt;&lt;P&gt;The second Unicode character in the red heart emoji is variation selector 16 (U+FE0F).&lt;/P&gt;&lt;P&gt;Using rex splits selects by character, but split() selects by UTF8 byte.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| rex field=text_string max_match=0 "(?P&amp;lt;text_split&amp;gt;.)" &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jul 2020 07:59:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507212#M141840</guid>
      <dc:creator>malvidin</dc:creator>
      <dc:date>2020-07-03T07:59:25Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification needed on eval split() function</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507377#M141867</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults
| eval text ="I&lt;span class="lia-unicode-emoji" title=":red_heart:"&gt;❤️&lt;/span&gt;Splunk"
| rex field=text max_match=0 "(?&amp;lt;text_split&amp;gt;[\w\p{S}])"&lt;/LI-CODE&gt;&lt;LI-CODE lang="markup"&gt;| makeresults
| eval text ="I".printf("%c",tonumber("2764",16)).printf("%c",tonumber("FE0F",16))."Splunk"
| rex field=text max_match=0 "(?&amp;lt;text_split&amp;gt;[\w\p{S}])"&lt;/LI-CODE&gt;&lt;P&gt;That's very interesting.&amp;nbsp;&lt;SPAN&gt;&lt;span class="lia-unicode-emoji" title=":red_heart:"&gt;❤️&lt;/span&gt; is multibyte. &lt;STRONG&gt;\p{S} &lt;/STRONG&gt;is match single unicode.&amp;nbsp;&lt;BR /&gt;How can I match the multibyte unicode(e.g. emoji )?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Jul 2020 23:58:47 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507377#M141867</guid>
      <dc:creator>to4kawa</dc:creator>
      <dc:date>2020-07-04T23:58:47Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification needed on eval split() function</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507876#M141963</link>
      <description>&lt;P&gt;I don't think you can match on multiple character emoji.&amp;nbsp;Separating by UTF8 byte (split) or by Unicode character (rex), Splunk only has to look at whether the codepoint is valid.&lt;/P&gt;&lt;P&gt;There are entire projects out there that build the regex based on the current Unicode definition. It is possible that you could create an app that would periodically update.&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/mathiasbynens/emoji-regex" target="_blank"&gt;https://github.com/mathiasbynens/emoji-regex&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You could recommend it at&amp;nbsp;&lt;A href="https://ideas.splunk.com/" target="_blank"&gt;https://ideas.splunk.com/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jul 2020 16:02:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/507876#M141963</guid>
      <dc:creator>malvidin</dc:creator>
      <dc:date>2020-07-07T16:02:06Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification needed on eval split() function</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/508015#M141987</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;| makeresults
| eval text ="I".printf("%c",tonumber("2764",16)).printf("%c",tonumber("FE0F",16))."Splunk"
| rex field=text max_match=0 "(?&amp;lt;text_split&amp;gt;\w|\p{S}.)"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hi &lt;a href="https://community.splunk.com/t5/user/viewprofilepage/user-id/23535"&gt;@malvidin&lt;/a&gt;&amp;nbsp;I could.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Jul 2020 09:19:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/508015#M141987</guid>
      <dc:creator>to4kawa</dc:creator>
      <dc:date>2020-07-08T09:19:37Z</dc:date>
    </item>
    <item>
      <title>Re: Clarification needed on eval split() function</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/508372#M142049</link>
      <description>&lt;P&gt;Based on your response, I think this just gets more complicated depending on how many Emoji we want to keep together.&lt;/P&gt;&lt;LI-CODE lang="c"&gt;| makeresults 
| eval text ="I ".printf("%c",tonumber("2764",16)).printf("%c",tonumber("FE0F",16))." Splunk &amp;amp; " 
    + printf("%c",tonumber("1F469",16)) 
    + printf("%c",tonumber("1F3FB",16)) 
    + printf("%c",tonumber("200D",16)) 
    + printf("%c",tonumber("1F468",16)) 
    + printf("%c",tonumber("1F3FD",16)) 
    + printf("%c",tonumber("200D",16)) 
    + printf("%c",tonumber("1F467",16)) 
    + printf("%c",tonumber("1F3FF",16)) 
    + " &amp;amp; "
    + printf("%c",tonumber("1F441",16)) 
    + printf("%c",tonumber("FE0F",16)) 
    + printf("%c",tonumber("200D",16)) 
    + printf("%c",tonumber("1F5E8",16)) 
    + printf("%c",tonumber("FE0F",16)) 
| rex field=text max_match=0 "(?&amp;lt;text_split&amp;gt;\p{So}[\x{1F3FB}-\x{1F3FF}]?(?:\x{200D}\p{So}[\x{1F3FB}-\x{1F3FF}]?(?:\x{200D}\p{So}[\x{1F3FB}-\x{1F3FF}]?)|[\x{FE00}-\x{FE0F}])|\p{So}[\x{1F3FB}-\x{1F3FF}]|.)"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 Jul 2020 18:09:44 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Clarification-needed-on-eval-split-function/m-p/508372#M142049</guid>
      <dc:creator>malvidin</dc:creator>
      <dc:date>2020-07-09T18:09:44Z</dc:date>
    </item>
  </channel>
</rss>

