Splunk Search

How do I find the unique elements between two columns?

mbintz
Explorer

If I have a search that returns a table with multi-values in two different columns, how can I find the unique elements in one column vs the other?

For example, if my results returns colA and colB I want to add a result column "result" as follows:

key       colA             colB         Result
1     dog cat bird       dog bird        cat
2     red green blue     green blue      red
3     ford chevy dodge   ford chevy     dodge

For what it's worth, colA and colB are the results of two searches and the table is a join, like so:

search key=* | stats values(stuff) as M by key | sort M | 
     join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ]
0 Karma

maciep
Champion

Whenever I work with mv fields, I always feel like I'm going out of my way to get where I'm headed, but this seems to work with some very basic test data...

... | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") 
| mvexpand tmp 
| eventstats count by key tmp 
| eval result = if(count=1,tmp,"") 
| fields - count tmp 
| mvcombine result
  1. So I start by putting colA and colB togehter in one field and then use join/split to ultimately get each value from both fields in one mv field, dups included.

  2. Then I use mvexpand to create one event for each of those values.

  3. Now I can use eventstats to count how many of each value exist for each key.

  4. Then create a new result field that contains the value if there is only one of them (so it only exists in one of the lists), or just an empty string if greater than one.

  5. Finally, get rid some of those unneeded field I created and combine the events back together based on the result field.

Side note, not sure if your search/data is more complicated than it looks, but are you sure you need to be doing a join? Any reason you can't just do stats values(stuff) as M values(otherStuff) as N by key

mbintz
Explorer

I had to make a few more tweaks. Because the initial search is doing stats on each set of results, the output of the search (and thus the join) is not a "stream" but a stats table. Despite my belief that my columns were already multi-values, they were not (apparently) and I had to split them. Because it was table, streamstats didn't work. A simple switch to "stats" did the trick,

So the final answer is:

... | eval tmp = mvappend(split(colA, " "), split(colB, " "))
| stats count by key,tmp
| eval result=if(count=1,tmp,"")
| fields - count tmp
mvcombine result

somesoni2
Revered Legend

This works if I replace | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") with | eval tmp = mvappend(colA,colB) .

sundareshr
Legend

Try this

 search key=* | stats values(stuff) as M by key | sort M | 
      join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ] 
     | eval merged=mvappend(M, N) 
     | eval unique=mvdedup(merged)

mbintz
Explorer

I downvoted this post because this is just a dedup'ed union. i'm looking for the elements on cola that are not in colb.

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...