Splunk Search

rex sed strings different length

faguilar
Path Finder

Hi!

Can somebody please explain me WTF is happening here?
My question is quite simple. I want to substitute [áéíóú] for [aeiou], using one single rex (anywhere on the string, but making a direct match between á and a, é and é, and so on. Like "José Ramón González" will be "Jose Ramon Gonzalez"
I already know how to do that with 5 regex and using a string replace. But I need to do that using one single rex (you can using sed without any problems).
I found out that in sed mode, doing this: y/àéíóú/aeiou/ (transliteration in sed) you can do that perfectly (you can try sed y/àéíóú/aeiou/ on the linux terminal).
However, the magic comes in Splunk. I have this Splunk regex:

| rex mode=sed field=name2 "y/á/a/"

And the result (in Splunk 6.3.1 and 7.1.1) is:

Error in 'rex' command: Failed to initialize sed. 'á' and 'a' are different length.

Ok... WTF!? Hoever I decided to try something like this:

| rex mode=sed field=name2 "y/á/aa/"

And the result is this one:

![alt text][1]

WTF!?? I think is a encoding thing (UTF-8 to UTF-16) but I don't know how to solve this.
Can somebody please help me? Is there a way to explicitlly tell splunk the encoding I'm using and I want to use in the regex? I already have defined the extraction as UTF-8. Why does this works perfectly in linux, but not in Splunk??
As you can check here: http://docs.splunk.com/Documentation/Splunk/6.3.1/SearchReference/rex Splunk supports that /y sed subsitution.

Thank you

0 Karma

darrenfuller
Contributor

Can't think of a way to do it in a single pass, but this works:

| makeresults | eval data="Jûán Pérëz Ä Žîs Çópú Ö'ñó", origdata=data
| rex field="data" mode=sed "s/[ÀÁÂÃÄ]/A/g"
| rex field="data" mode=sed "s/[Ç]/C/g"
| rex field="data" mode=sed "s/[ÈÉÊË]/E/g"
| rex field="data" mode=sed "s/[Ñ]/N/g"
| rex field="data" mode=sed "s/[ÒÓÔÕÖ]/O/g"
| rex field="data" mode=sed "s/[Š]/S/g"
| rex field="data" mode=sed "s/[ÙÚÛÜ]/U/g"
| rex field="data" mode=sed "s/[ÝŸ]/Y/g"
| rex field="data" mode=sed "s/[Ž]/Z/g"
| rex field="data" mode=sed "s/[àáâãäª]/a/g"
| rex field="data" mode=sed "s/[ç]/c/g"
| rex field="data" mode=sed "s/[èéêë]/e/g"
| rex field="data" mode=sed "s/[ìíîï]/i/g"
| rex field="data" mode=sed "s/[ñ]/n/g"
| rex field="data" mode=sed "s/[òóôöõº]/o/g"
| rex field="data" mode=sed "s/[ùúûü]/u/g"
| rex field="data" mode=sed "s/[ýÿ]/y/g"
| rex field="data" mode=sed "s/[š]/s/g"
| rex field="data" mode=sed "s/[ž]/z/g"

Output:

_time 2018-05-28 13:52:34
origdata Jûán Pérëz Ä Žîs Çópú Ö'ñó
data Juan Perez A Zis Copu O'no

0 Karma

faguilar
Path Finder

Thanks for the answer @darrenfuller, but I already know how to do it like you suggest. I need to do it in a single line, using the transliteration like in sed mode y/.

0 Karma

mayurr98
Super Champion

It's working at my end. must be a syntax problem.

| makeresults 
| eval data="àéíóú" 
| rex field=data mode=sed "s\àéíóú\aeiou\g"
0 Karma

FrankVl
Ultra Champion

Or a difference in character encoding settings of your splunk web / browser / os?

If I type à in notepad++ document set as UTF-8 it also says: length 2, compared to length = 1 for a. If I open a fresh notepad++ window set to ANSI encoding and type the same character à it shows as length 1, so I can imagine in certain cases, splunk will interpret it as a 2 byte character as well and throw that mismatch error?

0 Karma

faguilar
Path Finder

Hi @mayurr98,

Thank you for your answer, but maybe I expressed my problem on the wrong way.
It's not a syntax problem and I do not need to make that simple substitution (which I already know how to do), that's why I said that I used the sed y/àéíóú/aeiou/ which works for my scenario on the linux terminal.

I want to substitute those characters anywhere in the string, not in that exact order. Meaning that if I have the name

José González

that sed y/àéíóú/aeiou/ will substitute it prefectly, just á for an a, é for a é... and so on.

My problem here is that in splunk, the sed mode doesn't seems to work as the linux sed command.

I will upgrade my question to avoid any ambiguity

0 Karma

faguilar
Path Finder

For my search of example data:

| makeresults
| eval data="Juán Pérez Dís Tópú", data1=data
| rex field=data1 mode=sed "y/áéíóú/aaeeiioouu/"
| table data*

This is my output:

data --------------------------- data1
Juán Pérez Dís Tópú ----- Juaan Paerez Dais Taopau

And if i use the command | rex field=data1 mode=sed "y/áéíóú/aaeeiioouu/" the result is:

Error in 'rex' command: Failed to initialize sed. 'áéíóú' and 'aeiou' are different length.

0 Karma
Get Updates on the Splunk Community!

Splunk is Nurturing Tomorrow’s Cybersecurity Leaders Today

Meet Carol Wright. She leads the Splunk Academic Alliance program at Splunk. The Splunk Academic Alliance ...

Part 2: A Guide to Maximizing Splunk IT Service Intelligence

Welcome to the second segment of our guide. In Part 1, we covered the essentials of getting started with ITSI ...

Part 1: A Guide to Maximizing Splunk IT Service Intelligence

As modern IT environments continue to grow in complexity and speed, the ability to efficiently manage and ...