Splunk Search

How do I find the unique elements between two columns?

mbintz
Explorer

If I have a search that returns a table with multi-values in two different columns, how can I find the unique elements in one column vs the other?

For example, if my results returns colA and colB I want to add a result column "result" as follows:

key       colA             colB         Result
1     dog cat bird       dog bird        cat
2     red green blue     green blue      red
3     ford chevy dodge   ford chevy     dodge

For what it's worth, colA and colB are the results of two searches and the table is a join, like so:

search key=* | stats values(stuff) as M by key | sort M | 
     join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ]
0 Karma

maciep
Champion

Whenever I work with mv fields, I always feel like I'm going out of my way to get where I'm headed, but this seems to work with some very basic test data...

... | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") 
| mvexpand tmp 
| eventstats count by key tmp 
| eval result = if(count=1,tmp,"") 
| fields - count tmp 
| mvcombine result
  1. So I start by putting colA and colB togehter in one field and then use join/split to ultimately get each value from both fields in one mv field, dups included.

  2. Then I use mvexpand to create one event for each of those values.

  3. Now I can use eventstats to count how many of each value exist for each key.

  4. Then create a new result field that contains the value if there is only one of them (so it only exists in one of the lists), or just an empty string if greater than one.

  5. Finally, get rid some of those unneeded field I created and combine the events back together based on the result field.

Side note, not sure if your search/data is more complicated than it looks, but are you sure you need to be doing a join? Any reason you can't just do stats values(stuff) as M values(otherStuff) as N by key

mbintz
Explorer

I had to make a few more tweaks. Because the initial search is doing stats on each set of results, the output of the search (and thus the join) is not a "stream" but a stats table. Despite my belief that my columns were already multi-values, they were not (apparently) and I had to split them. Because it was table, streamstats didn't work. A simple switch to "stats" did the trick,

So the final answer is:

... | eval tmp = mvappend(split(colA, " "), split(colB, " "))
| stats count by key,tmp
| eval result=if(count=1,tmp,"")
| fields - count tmp
mvcombine result

somesoni2
Revered Legend

This works if I replace | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") with | eval tmp = mvappend(colA,colB) .

sundareshr
Legend

Try this

 search key=* | stats values(stuff) as M by key | sort M | 
      join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ] 
     | eval merged=mvappend(M, N) 
     | eval unique=mvdedup(merged)

mbintz
Explorer

I downvoted this post because this is just a dedup'ed union. i'm looking for the elements on cola that are not in colb.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...