If I have a search that returns a table with multi-values in two different columns, how can I find the unique elements in one column vs the other?
For example, if my results returns colA and colB I want to add a result column "result" as follows:
key colA colB Result
1 dog cat bird dog bird cat
2 red green blue green blue red
3 ford chevy dodge ford chevy dodge
For what it's worth, colA and colB are the results of two searches and the table is a join, like so:
search key=* | stats values(stuff) as M by key | sort M |
join type=left key [ search key=* | stats values(otherstuff) as N | sort N ]
Whenever I work with mv fields, I always feel like I'm going out of my way to get where I'm headed, but this seems to work with some very basic test data...
... | eval tmp = split(mvjoin(mvzip(colA,colB),","),",")
| mvexpand tmp
| eventstats count by key tmp
| eval result = if(count=1,tmp,"")
| fields - count tmp
| mvcombine result
So I start by putting colA and colB togehter in one field and then use join/split to ultimately get each value from both fields in one mv field, dups included.
Then I use mvexpand to create one event for each of those values.
Now I can use eventstats to count how many of each value exist for each key.
Then create a new result field that contains the value if there is only one of them (so it only exists in one of the lists), or just an empty string if greater than one.
Finally, get rid some of those unneeded field I created and combine the events back together based on the result field.
Side note, not sure if your search/data is more complicated than it looks, but are you sure you need to be doing a join? Any reason you can't just do stats values(stuff) as M values(otherStuff) as N by key
I had to make a few more tweaks. Because the initial search is doing stats on each set of results, the output of the search (and thus the join) is not a "stream" but a stats table. Despite my belief that my columns were already multi-values, they were not (apparently) and I had to split them. Because it was table, streamstats didn't work. A simple switch to "stats" did the trick,
So the final answer is:
... | eval tmp = mvappend(split(colA, " "), split(colB, " "))
| stats count by key,tmp
| eval result=if(count=1,tmp,"")
| fields - count tmp
mvcombine result
This works if I replace | eval tmp = split(mvjoin(mvzip(colA,colB),","),",")
with | eval tmp = mvappend(colA,colB)
.
Try this
search key=* | stats values(stuff) as M by key | sort M |
join type=left key [ search key=* | stats values(otherstuff) as N | sort N ]
| eval merged=mvappend(M, N)
| eval unique=mvdedup(merged)
I downvoted this post because this is just a dedup'ed union. i'm looking for the elements on cola that are not in colb.