All Apps and Add-ons

How to find similar values in a field?

templier
Communicator

Hello all!
I have a interesting question.
We have a next data:
Two field

a.tudhikova b-antuzh
a.rusevskaya    a_rusevskaya
a.rusevskaya    alishka92

How we can see a.rusevskaya and a_rusevskaya is similar
Question: can we make a request for matching similarity this field?
I understand that there will be errors in the definition, it's not critical.

0 Karma
1 Solution

cmerriman
Super Champion

try using the match command
http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/ConditionalFunctions#match.28SUBJE...

|eval similar=if(match(col2,col1),1,0)

here is sample code using your data above

|makeresults|eval data="col1='a.tudhikova',col2='b-antuzh' col1='a.rusevskaya',col2='a_rusevskaya' col1='a.rusevskaya',col2='alishka92'"|makemv data|mvexpand data|rename data as _raw|kv|rex mode=sed field=col1 "s/'//g"|rex mode=sed field=col2 "s/'//g"|eval similar=if(match(col2,col1),1,0)

View solution in original post

cmerriman
Super Champion

try using the match command
http://docs.splunk.com/Documentation/Splunk/7.1.2/SearchReference/ConditionalFunctions#match.28SUBJE...

|eval similar=if(match(col2,col1),1,0)

here is sample code using your data above

|makeresults|eval data="col1='a.tudhikova',col2='b-antuzh' col1='a.rusevskaya',col2='a_rusevskaya' col1='a.rusevskaya',col2='alishka92'"|makemv data|mvexpand data|rename data as _raw|kv|rex mode=sed field=col1 "s/'//g"|rex mode=sed field=col2 "s/'//g"|eval similar=if(match(col2,col1),1,0)

templier
Communicator

@cmerriman hi.
In testing have a trouble:
Have a two address:
a.krikun - akrikunart

And this couple is not similar. Can we modify regex?

0 Karma

cmerriman
Super Champion

You could add an OR statement in the if statement. Haven’t tested that myself yet, though.

|eval similar=if(match(col2,col1) OR match(col1,col2),1,0)
0 Karma

templier
Communicator

I test it - not work.
We have the are many email log from users. And we want see when user send mail to personal email. Very often they are using similar address, few in example in first post, and one more:
v.anasimova - anasimova.v.s

0 Karma

templier
Communicator

Hello,
Wow, it's worked. Many thanks for answer.

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...