How do I find the unique elements between two colu...

mbintz · ‎10-14-2016

If I have a search that returns a table with multi-values in two different columns, how can I find the unique elements in one column vs the other?

For example, if my results returns colA and colB I want to add a result column "result" as follows:

key       colA             colB         Result
1     dog cat bird       dog bird        cat
2     red green blue     green blue      red
3     ford chevy dodge   ford chevy     dodge

For what it's worth, colA and colB are the results of two searches and the table is a join, like so:

search key=* | stats values(stuff) as M by key | sort M | 
     join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ]

maciep · ‎10-14-2016

Whenever I work with mv fields, I always feel like I'm going out of my way to get where I'm headed, but this seems to work with some very basic test data...

... | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") 
| mvexpand tmp 
| eventstats count by key tmp 
| eval result = if(count=1,tmp,"") 
| fields - count tmp 
| mvcombine result

So I start by putting colA and colB togehter in one field and then use join/split to ultimately get each value from both fields in one mv field, dups included.
Then I use mvexpand to create one event for each of those values.
Now I can use eventstats to count how many of each value exist for each key.
Then create a new result field that contains the value if there is only one of them (so it only exists in one of the lists), or just an empty string if greater than one.
Finally, get rid some of those unneeded field I created and combine the events back together based on the result field.

Side note, not sure if your search/data is more complicated than it looks, but are you sure you need to be doing a join? Any reason you can't just do stats values(stuff) as M values(otherStuff) as N by key

mbintz · ‎10-15-2016

I had to make a few more tweaks. Because the initial search is doing stats on each set of results, the output of the search (and thus the join) is not a "stream" but a stats table. Despite my belief that my columns were already multi-values, they were not (apparently) and I had to split them. Because it was table, streamstats didn't work. A simple switch to "stats" did the trick,

So the final answer is:

... | eval tmp = mvappend(split(colA, " "), split(colB, " "))
| stats count by key,tmp
| eval result=if(count=1,tmp,"")
| fields - count tmp
mvcombine result

somesoni2 · ‎10-14-2016

This works if I replace | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") with | eval tmp = mvappend(colA,colB) .

sundareshr · ‎10-14-2016

Try this

 search key=* | stats values(stuff) as M by key | sort M | 
      join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ] 
     | eval merged=mvappend(M, N) 
     | eval unique=mvdedup(merged)

mbintz · ‎10-15-2016

I downvoted this post because this is just a dedup'ed union. i'm looking for the elements on cola that are not in colb.

How do I find the unique elements between two columns?

Developer Spotlight with Paul Stout

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

Data-Driven Success: Splunk & Financial Services