Splunk Search

How do I find the unique elements between two columns?

mbintz
Explorer

If I have a search that returns a table with multi-values in two different columns, how can I find the unique elements in one column vs the other?

For example, if my results returns colA and colB I want to add a result column "result" as follows:

key       colA             colB         Result
1     dog cat bird       dog bird        cat
2     red green blue     green blue      red
3     ford chevy dodge   ford chevy     dodge

For what it's worth, colA and colB are the results of two searches and the table is a join, like so:

search key=* | stats values(stuff) as M by key | sort M | 
     join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ]
0 Karma

maciep
Champion

Whenever I work with mv fields, I always feel like I'm going out of my way to get where I'm headed, but this seems to work with some very basic test data...

... | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") 
| mvexpand tmp 
| eventstats count by key tmp 
| eval result = if(count=1,tmp,"") 
| fields - count tmp 
| mvcombine result
  1. So I start by putting colA and colB togehter in one field and then use join/split to ultimately get each value from both fields in one mv field, dups included.

  2. Then I use mvexpand to create one event for each of those values.

  3. Now I can use eventstats to count how many of each value exist for each key.

  4. Then create a new result field that contains the value if there is only one of them (so it only exists in one of the lists), or just an empty string if greater than one.

  5. Finally, get rid some of those unneeded field I created and combine the events back together based on the result field.

Side note, not sure if your search/data is more complicated than it looks, but are you sure you need to be doing a join? Any reason you can't just do stats values(stuff) as M values(otherStuff) as N by key

mbintz
Explorer

I had to make a few more tweaks. Because the initial search is doing stats on each set of results, the output of the search (and thus the join) is not a "stream" but a stats table. Despite my belief that my columns were already multi-values, they were not (apparently) and I had to split them. Because it was table, streamstats didn't work. A simple switch to "stats" did the trick,

So the final answer is:

... | eval tmp = mvappend(split(colA, " "), split(colB, " "))
| stats count by key,tmp
| eval result=if(count=1,tmp,"")
| fields - count tmp
mvcombine result

somesoni2
Revered Legend

This works if I replace | eval tmp = split(mvjoin(mvzip(colA,colB),","),",") with | eval tmp = mvappend(colA,colB) .

sundareshr
Legend

Try this

 search key=* | stats values(stuff) as M by key | sort M | 
      join type=left key [ search key=* | stats values(otherstuff)  as N | sort N ] 
     | eval merged=mvappend(M, N) 
     | eval unique=mvdedup(merged)

mbintz
Explorer

I downvoted this post because this is just a dedup'ed union. i'm looking for the elements on cola that are not in colb.

0 Karma
Get Updates on the Splunk Community!

Developer Spotlight with Paul Stout

Welcome to our very first developer spotlight release series where we'll feature some awesome Splunk ...

State of Splunk Careers 2024: Maximizing Career Outcomes and the Continued Value of ...

For the past four years, Splunk has partnered with Enterprise Strategy Group to conduct a survey that gauges ...

Data-Driven Success: Splunk & Financial Services

Splunk streamlines the process of extracting insights from large volumes of data. In this fast-paced world, ...