Solved: Re: Remove common words from two fields and keep u...

samvijay · ‎07-19-2017

Here is an interesting problem, I tried different approaches using regex, mvdbedup, coalesce etc.. it did not work. need guidance from experts.

I have two fields field1 and field2 from a same event, field1 has value of "I want to buy a book" field2 has value of "I want to buy a phone"

As you can see, the content of both the fields are same except the words book and phone. I want the result like below

field1     field2
book      phone

I am simplifying the problem, but in reality each fields can contain a paragraph, but there will be few words which are unique in each field, which I want to extract.

knielsen · ‎07-20-2017

Now that's a fun challenge! 🙂

I only got very, well, not elegant solutions. I am sure there will be a better answer, but anyway....

Are the strings guaranteed to have the same order of words except the different ones? Then this is an approach:

| makeresults | eval field1="I want to buy a book now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2

If you got mixed order of words maybe, then this is an approach:

| makeresults | eval field1="I want to a book buy now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2 | eval field1sav=field1| mvexpand field1 | eval n=if(match(field2,field1),1,0) | where n=0 | mvexpand field2 | eval n=if(match(field1sav,field2),1,0) | where n=0 | table field1 field2

but that's still not stable for conditions like extra words in either string. But maybe this helps in finding a better solution. I am sure the regulars will jump in later. 😉

View solution in original post

niketn · ‎07-20-2017

@samvijay, this is answered already. I am just throwing in another option:

| makeresults 
| eval field1="I want to buy a book by today" 
| eval field2="I want to buy a phone by tomorrow"
| eval arrField1=split(field1," ")
| eval arrField2=split(field2," ")
| eval combined=mvzip(arrField1, arrField2)
| table combined
| mvexpand combined
| eval field1=replace(combined,"([^,]+),(.+)","\1")
| eval field2=replace(combined,"([^,]+),(.+)","\2")
| table field1 field2
| where field1!=field2

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

samvijay · ‎07-20-2017

Thanks @niketnilay

knielsen · ‎07-20-2017

Now that's a fun challenge! 🙂

I only got very, well, not elegant solutions. I am sure there will be a better answer, but anyway....

Are the strings guaranteed to have the same order of words except the different ones? Then this is an approach:

| makeresults | eval field1="I want to buy a book now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2

If you got mixed order of words maybe, then this is an approach:

| makeresults | eval field1="I want to a book buy now" | eval field2="I want to buy a phone now" 
| makemv delim=" " field1 |makemv delim=" " field2 | eval field1sav=field1| mvexpand field1 | eval n=if(match(field2,field1),1,0) | where n=0 | mvexpand field2 | eval n=if(match(field1sav,field2),1,0) | where n=0 | table field1 field2

but that's still not stable for conditions like extra words in either string. But maybe this helps in finding a better solution. I am sure the regulars will jump in later. 😉

samvijay · ‎07-20-2017

Thanks @knielsen, the string are guaranteed to have the same order, however there can be many mismatches like below, I need to get them in the same row separated by comma

| makeresults | eval field1="I want to buy a book may be now" | eval field2="I want to buy a phone may be tomorrow" 
 | makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2 field3


field1                                   field2
book, now                          phone, tomorrow

samvijay · ‎07-20-2017

Thanks @knielsen, that was really helpful

knielsen · ‎07-20-2017

I'll feel probably stupid when someone posts a pretty solution, but you get the correct result with this:

| makeresults | eval field1="I want to buy a book may be now" | eval field2="I want to buy a phone may be tomorrow" 
  | makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?[^,]+),(?.+)" | where NOT field1=field2 | stats list(field1) as field1 list(field2) as field2 | eval field1=mvjoin(field1, ",") | eval field2=mvjoin(field2, ",")

🙂

Remove common words from two fields and keep unique values

Enterprise Security Content Update (ESCU) | New Releases

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

Index This | What are the 12 Days of Splunk-mas?