Hi!
Can somebody please explain me WTF is happening here?
My question is quite simple. I want to substitute [áéíóú] for [aeiou], using one single rex (anywhere on the string, but making a direct match between á and a, é and é, and so on. Like "José Ramón González" will be "Jose Ramon Gonzalez"
I already know how to do that with 5 regex and using a string replace. But I need to do that using one single rex (you can using sed without any problems).
I found out that in sed mode, doing this: y/àéíóú/aeiou/
(transliteration in sed) you can do that perfectly (you can try sed y/àéíóú/aeiou/
on the linux terminal).
However, the magic comes in Splunk. I have this Splunk regex:
| rex mode=sed field=name2 "y/á/a/"
And the result (in Splunk 6.3.1 and 7.1.1) is:
Error in 'rex' command: Failed to initialize sed. 'á' and 'a' are different length.
Ok... WTF!? Hoever I decided to try something like this:
| rex mode=sed field=name2 "y/á/aa/"
And the result is this one:
WTF!?? I think is a encoding thing (UTF-8 to UTF-16) but I don't know how to solve this.
Can somebody please help me? Is there a way to explicitlly tell splunk the encoding I'm using and I want to use in the regex? I already have defined the extraction as UTF-8. Why does this works perfectly in linux, but not in Splunk??
As you can check here: http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/rex Splunk supports that /y sed subsitution.
Thank you
Can't think of a way to do it in a single pass, but this works:
| makeresults | eval data="Jûán Pérëz Ä Žîs Çópú Ö'ñó", origdata=data
| rex field="data" mode=sed "s/[ÀÁÂÃÄ]/A/g"
| rex field="data" mode=sed "s/[Ç]/C/g"
| rex field="data" mode=sed "s/[ÈÉÊË]/E/g"
| rex field="data" mode=sed "s/[Ñ]/N/g"
| rex field="data" mode=sed "s/[ÒÓÔÕÖ]/O/g"
| rex field="data" mode=sed "s/[Š]/S/g"
| rex field="data" mode=sed "s/[ÙÚÛÜ]/U/g"
| rex field="data" mode=sed "s/[ÝŸ]/Y/g"
| rex field="data" mode=sed "s/[Ž]/Z/g"
| rex field="data" mode=sed "s/[àáâãäª]/a/g"
| rex field="data" mode=sed "s/[ç]/c/g"
| rex field="data" mode=sed "s/[èéêë]/e/g"
| rex field="data" mode=sed "s/[ìíîï]/i/g"
| rex field="data" mode=sed "s/[ñ]/n/g"
| rex field="data" mode=sed "s/[òóôöõº]/o/g"
| rex field="data" mode=sed "s/[ùúûü]/u/g"
| rex field="data" mode=sed "s/[ýÿ]/y/g"
| rex field="data" mode=sed "s/[š]/s/g"
| rex field="data" mode=sed "s/[ž]/z/g"
Output:
_time 2018-05-28 13:52:34
origdata Jûán Pérëz Ä Žîs Çópú Ö'ñó
data Juan Perez A Zis Copu O'no
Thanks for the answer @darrenfuller, but I already know how to do it like you suggest. I need to do it in a single line, using the transliteration like in sed mode y/.
It's working at my end. must be a syntax problem.
| makeresults
| eval data="àéíóú"
| rex field=data mode=sed "s\àéíóú\aeiou\g"
Or a difference in character encoding settings of your splunk web / browser / os?
If I type à
in notepad++ document set as UTF-8 it also says: length 2, compared to length = 1 for a
. If I open a fresh notepad++ window set to ANSI encoding and type the same character à
it shows as length 1, so I can imagine in certain cases, splunk will interpret it as a 2 byte character as well and throw that mismatch error?
Hi @mayurr98,
Thank you for your answer, but maybe I expressed my problem on the wrong way.
It's not a syntax problem and I do not need to make that simple substitution (which I already know how to do), that's why I said that I used the sed y/àéíóú/aeiou/
which works for my scenario on the linux terminal.
I want to substitute those characters anywhere in the string, not in that exact order. Meaning that if I have the name
José González
that sed y/àéíóú/aeiou/
will substitute it prefectly, just á for an a, é for a é... and so on.
My problem here is that in splunk, the sed mode doesn't seems to work as the linux sed command.
I will upgrade my question to avoid any ambiguity
For my search of example data:
| makeresults
| eval data="Juán Pérez Dís Tópú", data1=data
| rex field=data1 mode=sed "y/áéíóú/aaeeiioouu/"
| table data*
This is my output:
data --------------------------- data1
Juán Pérez Dís Tópú ----- Juaan Paerez Dais Taopau
And if i use the command | rex field=data1 mode=sed "y/áéíóú/aaeeiioouu/"
the result is:
Error in 'rex' command: Failed to initialize sed. 'áéíóú' and 'aeiou' are different length.