<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: rex sed strings different length in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410769#M118544</link>
    <description>&lt;P&gt;Can't think of a way to do it in a single pass, but this works: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults | eval data="Jûán Pérëz Ä Žîs Çópú Ö'ñó", origdata=data
| rex field="data" mode=sed "s/[ÀÁÂÃÄ]/A/g"
| rex field="data" mode=sed "s/[Ç]/C/g"
| rex field="data" mode=sed "s/[ÈÉÊË]/E/g"
| rex field="data" mode=sed "s/[Ñ]/N/g"
| rex field="data" mode=sed "s/[ÒÓÔÕÖ]/O/g"
| rex field="data" mode=sed "s/[Š]/S/g"
| rex field="data" mode=sed "s/[ÙÚÛÜ]/U/g"
| rex field="data" mode=sed "s/[ÝŸ]/Y/g"
| rex field="data" mode=sed "s/[Ž]/Z/g"
| rex field="data" mode=sed "s/[àáâãäª]/a/g"
| rex field="data" mode=sed "s/[ç]/c/g"
| rex field="data" mode=sed "s/[èéêë]/e/g"
| rex field="data" mode=sed "s/[ìíîï]/i/g"
| rex field="data" mode=sed "s/[ñ]/n/g"
| rex field="data" mode=sed "s/[òóôöõº]/o/g"
| rex field="data" mode=sed "s/[ùúûü]/u/g"
| rex field="data" mode=sed "s/[ýÿ]/y/g"
| rex field="data" mode=sed "s/[š]/s/g"
| rex field="data" mode=sed "s/[ž]/z/g"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Output: &lt;/P&gt;

&lt;P&gt;_time       2018-05-28 13:52:34 &lt;BR /&gt;
origdata        Jûán Pérëz Ä Žîs Çópú Ö'ñó&lt;BR /&gt;
data        Juan Perez A Zis Copu O'no  &lt;/P&gt;</description>
    <pubDate>Mon, 28 May 2018 17:55:11 GMT</pubDate>
    <dc:creator>darrenfuller</dc:creator>
    <dc:date>2018-05-28T17:55:11Z</dc:date>
    <item>
      <title>rex sed strings different length</title>
      <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410764#M118539</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;

&lt;P&gt;Can somebody please explain me WTF is happening here?&lt;BR /&gt;
My question is quite simple. I want to substitute [áéíóú] for [aeiou], using one single rex (anywhere on the string, but making a direct match between á and a, é and é, and so on. Like "José Ramón González" will be "Jose Ramon Gonzalez"&lt;BR /&gt;
I already know how to do that with 5 regex and using a string replace. But I need to do that using one single rex (you can using sed without any problems).&lt;BR /&gt;
I found out that in sed mode, doing this: &lt;CODE&gt;y/àéíóú/aeiou/&lt;/CODE&gt; (transliteration in sed) you can do that perfectly (you can try &lt;CODE&gt;sed y/àéíóú/aeiou/&lt;/CODE&gt; on the linux terminal).&lt;BR /&gt;
However, the magic comes in Splunk. I have this Splunk regex:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;| rex mode=sed field=name2 "y/á/a/"&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;And the result (in Splunk 6.3.1 and 7.1.1) is:&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Error in 'rex' command: Failed to initialize sed. 'á' and 'a' are different length.&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;Ok... WTF!? Hoever I decided to try something like this:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;| rex mode=sed field=name2 "y/á/aa/"&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;And the result is this one:&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="![alt text][1]"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/4999i4454ACE014891897/image-size/large?v=v2&amp;amp;px=999" role="button" title="![alt text][1]" alt="![alt text][1]" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;WTF!?? I think is a encoding thing (UTF-8 to UTF-16) but I don't know how to solve this.&lt;BR /&gt;
Can somebody please help me? Is there a way to explicitlly tell splunk the encoding I'm using and I want to use in the regex? I already have defined the extraction as UTF-8. Why does this works perfectly in linux, but not in Splunk??&lt;BR /&gt;
As you can check here: &lt;A href="http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/rex"&gt;http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/rex&lt;/A&gt; Splunk supports that &lt;STRONG&gt;/y&lt;/STRONG&gt; sed subsitution.&lt;/P&gt;

&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 25 May 2018 09:00:32 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410764#M118539</guid>
      <dc:creator>faguilar</dc:creator>
      <dc:date>2018-05-25T09:00:32Z</dc:date>
    </item>
    <item>
      <title>Re: rex sed strings different length</title>
      <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410765#M118540</link>
      <description>&lt;P&gt;It's working at my end. must be a syntax problem.&lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults 
| eval data="àéíóú" 
| rex field=data mode=sed "s\àéíóú\aeiou\g"
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 25 May 2018 09:50:06 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410765#M118540</guid>
      <dc:creator>mayurr98</dc:creator>
      <dc:date>2018-05-25T09:50:06Z</dc:date>
    </item>
    <item>
      <title>Re: rex sed strings different length</title>
      <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410766#M118541</link>
      <description>&lt;P&gt;Or a difference in character encoding settings of your splunk web / browser / os?&lt;/P&gt;

&lt;P&gt;If I type &lt;CODE&gt;à&lt;/CODE&gt; in notepad++ document set as UTF-8 it also says: length 2, compared to length = 1 for &lt;CODE&gt;a&lt;/CODE&gt;. If I open a fresh notepad++ window set to ANSI encoding and type the same character &lt;CODE&gt;à&lt;/CODE&gt; it shows as length 1, so I can imagine in certain cases, splunk will interpret it as a 2 byte character as well and throw that mismatch error?&lt;/P&gt;</description>
      <pubDate>Fri, 25 May 2018 11:30:37 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410766#M118541</guid>
      <dc:creator>FrankVl</dc:creator>
      <dc:date>2018-05-25T11:30:37Z</dc:date>
    </item>
    <item>
      <title>Re: rex sed strings different length</title>
      <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410767#M118542</link>
      <description>&lt;P&gt;Hi @mayurr98,&lt;/P&gt;

&lt;P&gt;Thank you for your answer, but maybe I expressed my problem on the wrong way.&lt;BR /&gt;
It's not a syntax problem and I do not need to make that simple substitution (which I already know how to do), that's why I said that I used the &lt;CODE&gt;sed y/àéíóú/aeiou/&lt;/CODE&gt; which works for my scenario on the linux terminal.&lt;/P&gt;

&lt;P&gt;I want to substitute those characters anywhere in the string, not in that exact order. Meaning that if I have the name &lt;/P&gt;

&lt;BLOCKQUOTE&gt;
&lt;P&gt;José González&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;that &lt;CODE&gt;sed y/àéíóú/aeiou/&lt;/CODE&gt; will substitute it prefectly, just á for an a, é for a é... and so on.&lt;/P&gt;

&lt;P&gt;My problem here is that in splunk, the sed mode doesn't seems to work as the linux sed command.&lt;/P&gt;

&lt;P&gt;I will upgrade my question to avoid any ambiguity&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 09:44:04 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410767#M118542</guid>
      <dc:creator>faguilar</dc:creator>
      <dc:date>2018-05-28T09:44:04Z</dc:date>
    </item>
    <item>
      <title>Re: rex sed strings different length</title>
      <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410768#M118543</link>
      <description>&lt;P&gt;For my search of example data:&lt;/P&gt;

&lt;P&gt;&lt;CODE&gt;| makeresults &lt;BR /&gt;
 | eval data="Juán Pérez Dís Tópú", data1=data&lt;BR /&gt;
 | rex field=data1 mode=sed "y/áéíóú/aaeeiioouu/"&lt;BR /&gt;
| table data*&lt;/CODE&gt;&lt;/P&gt;

&lt;P&gt;This is my output:&lt;/P&gt;

&lt;P&gt;data     ---------------------------     data1&lt;BR /&gt;
Juán Pérez Dís Tópú     -----     Juaan Paerez Dais Taopau&lt;/P&gt;

&lt;P&gt;And if i use the command &lt;CODE&gt;| rex field=data1 mode=sed "y/áéíóú/aaeeiioouu/"&lt;/CODE&gt; the result is:&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;Error in 'rex' command: Failed to initialize sed. 'áéíóú' and 'aeiou' are different length.&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 10:09:17 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410768#M118543</guid>
      <dc:creator>faguilar</dc:creator>
      <dc:date>2018-05-28T10:09:17Z</dc:date>
    </item>
    <item>
      <title>Re: rex sed strings different length</title>
      <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410769#M118544</link>
      <description>&lt;P&gt;Can't think of a way to do it in a single pass, but this works: &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| makeresults | eval data="Jûán Pérëz Ä Žîs Çópú Ö'ñó", origdata=data
| rex field="data" mode=sed "s/[ÀÁÂÃÄ]/A/g"
| rex field="data" mode=sed "s/[Ç]/C/g"
| rex field="data" mode=sed "s/[ÈÉÊË]/E/g"
| rex field="data" mode=sed "s/[Ñ]/N/g"
| rex field="data" mode=sed "s/[ÒÓÔÕÖ]/O/g"
| rex field="data" mode=sed "s/[Š]/S/g"
| rex field="data" mode=sed "s/[ÙÚÛÜ]/U/g"
| rex field="data" mode=sed "s/[ÝŸ]/Y/g"
| rex field="data" mode=sed "s/[Ž]/Z/g"
| rex field="data" mode=sed "s/[àáâãäª]/a/g"
| rex field="data" mode=sed "s/[ç]/c/g"
| rex field="data" mode=sed "s/[èéêë]/e/g"
| rex field="data" mode=sed "s/[ìíîï]/i/g"
| rex field="data" mode=sed "s/[ñ]/n/g"
| rex field="data" mode=sed "s/[òóôöõº]/o/g"
| rex field="data" mode=sed "s/[ùúûü]/u/g"
| rex field="data" mode=sed "s/[ýÿ]/y/g"
| rex field="data" mode=sed "s/[š]/s/g"
| rex field="data" mode=sed "s/[ž]/z/g"
&lt;/CODE&gt;&lt;/PRE&gt;

&lt;P&gt;Output: &lt;/P&gt;

&lt;P&gt;_time       2018-05-28 13:52:34 &lt;BR /&gt;
origdata        Jûán Pérëz Ä Žîs Çópú Ö'ñó&lt;BR /&gt;
data        Juan Perez A Zis Copu O'no  &lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 17:55:11 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410769#M118544</guid>
      <dc:creator>darrenfuller</dc:creator>
      <dc:date>2018-05-28T17:55:11Z</dc:date>
    </item>
    <item>
      <title>Re: rex sed strings different length</title>
      <link>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410770#M118545</link>
      <description>&lt;P&gt;Thanks for the answer @darrenfuller, but I already know how to do it like you suggest.  I need to do it in a single line, using the transliteration like in sed mode y/.&lt;/P&gt;</description>
      <pubDate>Tue, 29 May 2018 10:02:15 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/rex-sed-strings-different-length/m-p/410770#M118545</guid>
      <dc:creator>faguilar</dc:creator>
      <dc:date>2018-05-29T10:02:15Z</dc:date>
    </item>
  </channel>
</rss>

