Splunk Search

Search help: remove intersection of two sets from the first set

Genti
Splunk Employee
Splunk Employee

A customer asked this search question a few days ago. I thought it was a good one for answers.

Assume you have two datasets where you can use a JOIN to find the intersection (based on a field). How can i then modify the search to list the events in the first data set that do not also show up in the second dataset?

Here's a simple example:

set1.log
May 31 23:00:01 MAC=a
May 31 23:00:05 MAC=b
May 31 23:00:06 MAC=c
May 31 23:00:07 MAC=d

set2.log
Jun 01 00:00:01 MAC=d
Jun 01 00:00:05 MAC=e
Jun 01 00:00:06 MAC=f
Jun 01 00:00:07 MAC=g

Using:
source=set1.log | JOIN MAC [search set2.log] i should get the intersection: d

What should my search be, in order to get set1 - intersection of set1 and set2 (in other words, set1 - set2)?

0 Karma
1 Solution

Genti
Splunk Employee
Splunk Employee

This can be achieved by the following search:

source="set1.log" | JOIN type=left MAC [search source="set2.log" | eval x=1] | Where NOT x=1

Explanation:

the subsearch will find events in set2.log and add a new field, x=1 to the event. (that is for D, E, F, G, the field x=1 will be associated to the events)

Then, were doing a LEFT JOIN, on the field MAC (which will return A B C D) , but we are leaving out those events for which x=1 (hence we are leaving out D).

The final result then becomes: A, B and C, that is, set1 - set2

View solution in original post

0 Karma

kristian_kolb
Ultra Champion

I'd assume that the following would be more efficient, but I haven't tried it out.

source=set1.log NOT [search source=set2.log | dedup MAC | fields + MAC ]

i.e find all MACs from set2, then show all events from set1 whose MAC addresses do not match the subsearch.

/kristian

0 Karma

Genti
Splunk Employee
Splunk Employee

This can be achieved by the following search:

source="set1.log" | JOIN type=left MAC [search source="set2.log" | eval x=1] | Where NOT x=1

Explanation:

the subsearch will find events in set2.log and add a new field, x=1 to the event. (that is for D, E, F, G, the field x=1 will be associated to the events)

Then, were doing a LEFT JOIN, on the field MAC (which will return A B C D) , but we are leaving out those events for which x=1 (hence we are leaving out D).

The final result then becomes: A, B and C, that is, set1 - set2

0 Karma
Get Updates on the Splunk Community!

Changes to Splunk Instructor-Led Training Completion Criteria

We’re excited to share an update to our instructor-led training program that enhances the learning experience ...

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

❄️ Welcome the new year with our January lineup of Community Office Hours, Tech Talks, and Webinars! 🎉 ...

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...