<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Remove segmenters in search (lispy) for Norwegian characters in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390597#M172968</link>
    <description>&lt;P&gt;Hi folks. Whenever you do a search in Splunk you can review the lispy in search.log. For example, if I search for my own username in the main index, the search would look like this  &lt;CODE&gt;index=main hettervi&lt;/CODE&gt; while the lispy would look like this  &lt;CODE&gt;[AND index::main hettervi]&lt;/CODE&gt;. However, since when I'm using Norwegian characters  &lt;CODE&gt;æ&lt;/CODE&gt;,  &lt;CODE&gt;ø&lt;/CODE&gt; and  &lt;CODE&gt;å&lt;/CODE&gt; the words gets segmentet in the lipsy. For example, if I search for the (fictional) Norwegian name "Hælgøvoll" the search would look like this  &lt;CODE&gt;index=main hælgøvoll&lt;/CODE&gt;, but the lipsy would look like this  &lt;CODE&gt;[AND index::main h lg voll æ ø]&lt;/CODE&gt;. See the problem?&lt;/P&gt;

&lt;P&gt;I've looked through the documentation for segmenters.conf, but as far as I can see there is no mention of Norwegian characters. Anyone got any tips for how to unlist the Norwegian characters as breakers, both at index time and in search time?&lt;/P&gt;</description>
    <pubDate>Mon, 18 Feb 2019 12:45:16 GMT</pubDate>
    <dc:creator>hettervik</dc:creator>
    <dc:date>2019-02-18T12:45:16Z</dc:date>
    <item>
      <title>Remove segmenters in search (lispy) for Norwegian characters</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390597#M172968</link>
      <description>&lt;P&gt;Hi folks. Whenever you do a search in Splunk you can review the lispy in search.log. For example, if I search for my own username in the main index, the search would look like this  &lt;CODE&gt;index=main hettervi&lt;/CODE&gt; while the lispy would look like this  &lt;CODE&gt;[AND index::main hettervi]&lt;/CODE&gt;. However, since when I'm using Norwegian characters  &lt;CODE&gt;æ&lt;/CODE&gt;,  &lt;CODE&gt;ø&lt;/CODE&gt; and  &lt;CODE&gt;å&lt;/CODE&gt; the words gets segmentet in the lipsy. For example, if I search for the (fictional) Norwegian name "Hælgøvoll" the search would look like this  &lt;CODE&gt;index=main hælgøvoll&lt;/CODE&gt;, but the lipsy would look like this  &lt;CODE&gt;[AND index::main h lg voll æ ø]&lt;/CODE&gt;. See the problem?&lt;/P&gt;

&lt;P&gt;I've looked through the documentation for segmenters.conf, but as far as I can see there is no mention of Norwegian characters. Anyone got any tips for how to unlist the Norwegian characters as breakers, both at index time and in search time?&lt;/P&gt;</description>
      <pubDate>Mon, 18 Feb 2019 12:45:16 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390597#M172968</guid>
      <dc:creator>hettervik</dc:creator>
      <dc:date>2019-02-18T12:45:16Z</dc:date>
    </item>
    <item>
      <title>Re: Remove segmenters in search (lispy) for Norwegian characters</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390598#M172969</link>
      <description>&lt;P&gt;@hettervi, you need to look at the encoding. &lt;CODE&gt;UTF-8&lt;/CODE&gt;, for example, as an implementation of &lt;CODE&gt;Unicode&lt;/CODE&gt;, covers all known languages. &lt;/P&gt;

&lt;P&gt;A good place to start is at - &lt;A href="https://docs.splunk.com/Documentation/SplunkCloud/7.2.3/Data/Configurecharactersetencoding"&gt;Configure character set encoding&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Feb 2019 14:37:33 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390598#M172969</guid>
      <dc:creator>ddrillic</dc:creator>
      <dc:date>2019-02-18T14:37:33Z</dc:date>
    </item>
    <item>
      <title>Re: Remove segmenters in search (lispy) for Norwegian characters</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390599#M172970</link>
      <description>&lt;P&gt;Perhaps. I'll look into it, though the problem isn't that the characters aren't supported, it is that the search head segments the searched words whenever the said characters occur. As far as I know, the generated lispy for a search isn't sourcetype dependent.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Feb 2019 12:03:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390599#M172970</guid>
      <dc:creator>hettervik</dc:creator>
      <dc:date>2019-02-19T12:03:24Z</dc:date>
    </item>
    <item>
      <title>Re: Remove segmenters in search (lispy) for Norwegian characters</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390600#M172971</link>
      <description>&lt;P&gt;I've looked into the case some more. An interesting observation is that searching for  &lt;CODE&gt;TERM(hælgøvoll)&lt;/CODE&gt; or  &lt;CODE&gt;TERM(h*lg*voll)&lt;/CODE&gt; gives no results. This lead me to believe that the Norwegian characters  &lt;CODE&gt;æ&lt;/CODE&gt;,  &lt;CODE&gt;ø&lt;/CODE&gt; and  &lt;CODE&gt;å&lt;/CODE&gt; are defined as &lt;EM&gt;major&lt;/EM&gt; breakers. However, if this was the case, they wouldn't be listed in the lispy as showed in my initial question. The only explanation I can come up with that explains the observed behavior is that there are some "hidden" major breakers &lt;EM&gt;before&lt;/EM&gt; &lt;STRONG&gt;and&lt;/STRONG&gt; &lt;EM&gt;after&lt;/EM&gt; the Norwegian characters  &lt;CODE&gt;æ&lt;/CODE&gt;,  &lt;CODE&gt;ø&lt;/CODE&gt; and  &lt;CODE&gt;å&lt;/CODE&gt;. I'm not sure if I'm correct in my assumption, and if this is a bug or a feature.&lt;/P&gt;</description>
      <pubDate>Tue, 19 Feb 2019 15:53:35 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/390600#M172971</guid>
      <dc:creator>hettervik</dc:creator>
      <dc:date>2019-02-19T15:53:35Z</dc:date>
    </item>
    <item>
      <title>Re: Remove segmenters in search (lispy) for Norwegian characters</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/605608#M210600</link>
      <description>&lt;P&gt;Yes, it appears that most (if not all) non-ASCII character are major breakers.&lt;/P&gt;&lt;P&gt;The lispy I see for a simple search for&amp;nbsp; тестирование&amp;nbsp;is:&lt;/P&gt;&lt;PRE&gt;[ AND index::main а в е и н о р с т ]&lt;/PRE&gt;&lt;P&gt;This is a bigger issue if the data is ingested in ASCII JSON format.&lt;/P&gt;&lt;PRE&gt;{"data": "\u0442\u0435\u0441\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435"}&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;If the data above is ingested, "data="тестирование"&amp;nbsp; or "тестирование" will not find the data.&amp;nbsp; An initial search like "u04*" must be included.&amp;nbsp; A similar issue occurs when the raw JSON includes a newline, as a string like {"data": "line_one\nline_two"} cannot be found with a search for "line_two".&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2022 14:43:39 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Remove-segmenters-in-search-lispy-for-Norwegian-characters/m-p/605608#M210600</guid>
      <dc:creator>malvidin</dc:creator>
      <dc:date>2022-07-14T14:43:39Z</dc:date>
    </item>
  </channel>
</rss>

