Here is an interesting problem, I tried different approaches using regex, mvdbedup, coalesce etc.. it did not work. need guidance from experts.
I have two fields field1 and field2 from a same event, field1 has value of "I want to buy a book" field2 has value of "I want to buy a phone"
As you can see, the content of both the fields are same except the words book and phone. I want the result like below
field1 field2
book phone
I am simplifying the problem, but in reality each fields can contain a paragraph, but there will be few words which are unique in each field, which I want to extract.
Now that's a fun challenge! 🙂
I only got very, well, not elegant solutions. I am sure there will be a better answer, but anyway....
Are the strings guaranteed to have the same order of words except the different ones? Then this is an approach:
| makeresults | eval field1="I want to buy a book now" | eval field2="I want to buy a phone now"
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2
If you got mixed order of words maybe, then this is an approach:
| makeresults | eval field1="I want to a book buy now" | eval field2="I want to buy a phone now"
| makemv delim=" " field1 |makemv delim=" " field2 | eval field1sav=field1| mvexpand field1 | eval n=if(match(field2,field1),1,0) | where n=0 | mvexpand field2 | eval n=if(match(field1sav,field2),1,0) | where n=0 | table field1 field2
but that's still not stable for conditions like extra words in either string. But maybe this helps in finding a better solution. I am sure the regulars will jump in later. 😉
@samvijay, this is answered already. I am just throwing in another option:
| makeresults
| eval field1="I want to buy a book by today"
| eval field2="I want to buy a phone by tomorrow"
| eval arrField1=split(field1," ")
| eval arrField2=split(field2," ")
| eval combined=mvzip(arrField1, arrField2)
| table combined
| mvexpand combined
| eval field1=replace(combined,"([^,]+),(.+)","\1")
| eval field2=replace(combined,"([^,]+),(.+)","\2")
| table field1 field2
| where field1!=field2
Thanks @niketnilay
Now that's a fun challenge! 🙂
I only got very, well, not elegant solutions. I am sure there will be a better answer, but anyway....
Are the strings guaranteed to have the same order of words except the different ones? Then this is an approach:
| makeresults | eval field1="I want to buy a book now" | eval field2="I want to buy a phone now"
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2
If you got mixed order of words maybe, then this is an approach:
| makeresults | eval field1="I want to a book buy now" | eval field2="I want to buy a phone now"
| makemv delim=" " field1 |makemv delim=" " field2 | eval field1sav=field1| mvexpand field1 | eval n=if(match(field2,field1),1,0) | where n=0 | mvexpand field2 | eval n=if(match(field1sav,field2),1,0) | where n=0 | table field1 field2
but that's still not stable for conditions like extra words in either string. But maybe this helps in finding a better solution. I am sure the regulars will jump in later. 😉
Thanks @knielsen, the string are guaranteed to have the same order, however there can be many mismatches like below, I need to get them in the same row separated by comma
| makeresults | eval field1="I want to buy a book may be now" | eval field2="I want to buy a phone may be tomorrow"
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?<field1>[^,]+),(?<field2>.+)" | where NOT field1=field2 | table field1 field2 field3
field1 field2
book, now phone, tomorrow
Thanks @knielsen, that was really helpful
I'll feel probably stupid when someone posts a pretty solution, but you get the correct result with this:
| makeresults | eval field1="I want to buy a book may be now" | eval field2="I want to buy a phone may be tomorrow"
| makemv delim=" " field1 |makemv delim=" " field2| eval comb=mvzip(field1, field2) | mvexpand comb | rex field=comb "(?[^,]+),(?.+)" | where NOT field1=field2 | stats list(field1) as field1 list(field2) as field2 | eval field1=mvjoin(field1, ",") | eval field2=mvjoin(field2, ",")
🙂