Knowledge Management

Fuzzy Logic match with multi word value

mjones414
Contributor

I have a situation where I'm using case to compare 2 fields to identify a fuzzy match, but in field 1 I may have "boa.com" and in field 2 I have "Bank Of America"  what I want to do is to take the letters of field 1 and the first letter of each word in field 2 (understanding there is no potential maximum number of words the value may contain).  I know I can usually do something with mvindex by using an index field of -1 to identify the "last value" of a multi value field, but I'm not sure how to try to marry that with case(like and substr().  Has anyone ever accomplished anything like this before?

 

I'm trying things like | rex field=Company "(?<CamelCase>\b(\w))" but its only returning "b" in CamelCase instead of "boa"

Labels (1)
Tags (3)
0 Karma
1 Solution

mjones414
Contributor

I was just about to come on here and post that I figured it out, but what I was doing isn't as elegant as what you did.

I did 

| makemv CompanyName
| rex field=CompanyName "(?<CamelCase>\b(\w))"
| eval CamelCase=mvjoin(CamelCase,"")
| nomv CompanyName
| eval DomainMatchesCompany=case(like(lower(CompanyName),"%".substr(lower(domain_root),1,3)."%"),"Yes",
like(lower(CamelCase),"%".substr(lower(domain_root),1,3)."%"),"Yes", 1=1,"No")


I will try your Approach and see if I get something similar

View solution in original post

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Similar to this response, try something like this

| rex max_match=0 field=field2 "(?<initial>[a-zA-Z])[a-zA-Z]* ?"
| eval webdomain=lower(mvjoin(initial,"")).".com"

 

mjones414
Contributor

I was just about to come on here and post that I figured it out, but what I was doing isn't as elegant as what you did.

I did 

| makemv CompanyName
| rex field=CompanyName "(?<CamelCase>\b(\w))"
| eval CamelCase=mvjoin(CamelCase,"")
| nomv CompanyName
| eval DomainMatchesCompany=case(like(lower(CompanyName),"%".substr(lower(domain_root),1,3)."%"),"Yes",
like(lower(CamelCase),"%".substr(lower(domain_root),1,3)."%"),"Yes", 1=1,"No")


I will try your Approach and see if I get something similar

0 Karma

mjones414
Contributor

So as it turns out with regard to my data, word boundaries and \w work great but since the string values actually do  contain whitespace, I have to convert it to multivalue to get the desired outcome.  if I do the pre-processing steps, both of our regular expressions seem to get the job done 🙂  Thanks so much for your reply!

0 Karma
Get Updates on the Splunk Community!

Index This | Forward, I’m heavy; backward, I’m not. What am I?

April 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

A Guide To Cloud Migration Success

As enterprises’ rapid expansion to the cloud continues, IT leaders are continuously looking for ways to focus ...

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...