Getting Data In

Union / Intersect results from different source type

clincg
Path Finder

Hi - I'm trying to union/intersect results from different source type using the SET command:

set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

However, it's not returning any results, even though if I run the search separately, it will return the results that I want.

Does anyone have experience with the SET command? The documentation for the SET command doesn't really have any details explanations. If it's not possible to union/intersect results from different source type, what would be a good way to correlate the data?

Thanks!

Tags (1)
0 Karma
1 Solution

bwooden
Splunk Employee
Splunk Employee

The set command needs a leading pipe to distinguish it as a search command and not a search term...

| set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

View solution in original post

Stephen_Sorkin
Splunk Employee
Splunk Employee

This is not an efficient way to solve this problem. With Splunk, it's much, much better to have the work done in a single search pass with an OR than to run two searches and combine the results.

Here you should search:

(sourcetype=first_source 404) OR (sourcetype=second_source 303) | fields url
(sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count by url

Now the efficient version of intersect is very similar. Here we count the number of sourcetypes per url and only keep those urls with more than 1 sourcetype (hence both the sourcetypes):

(sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count dc(sourcetype) as sourcetypes by url | search sourcetypes > 1 

These searches can (and will) return an unlimited number of rows, and statistics will be accurate on the entire result set. Using set will only pull a limited number of rows from each search.

thambisetty
SplunkTrust
SplunkTrust

Hi Stephen,

I have two big data sets. one has 300 million records with only one field and its a hash. i have another data set which has 20 million records with 3 fields username and hashes. I would like to compare these two sets to find any hashes are matching in two data sets. is there any effective way to do this except the above two solutions as I have already tried them and they are not effective.

————————————
If this helps, give a like below.
0 Karma

e_sherlock
Explorer

Makes sense; however, if the two different sourcetypes have different fields (say "url1" and "url2") that I want to union into one field, what's the best way to do that?

I'm trying to avoid this delimiter solution below if possible...
yoursearchhere |
eval output = field1 + ";" + field2 |
makemv delim=";" output |
mvexpand output

0 Karma

Lowell
Super Champion
 | set union [search sourcetype="first_source" 404 | fields url | fields - _* ] [search sourcetype="second_source" 303 | fields url | fields - _* ]

It's weird, by by default set will try to compare other hidden fields too, such as _time and _raw which you probably do not want.

The above search should take care of this.


Update: I've mentioned this to the doc team, and they have now noted this on the set documentation page.

bwooden
Splunk Employee
Splunk Employee

Thanks not just for the follow-up on Answers but improved documentation as well!

0 Karma

bwooden
Splunk Employee
Splunk Employee

The set command needs a leading pipe to distinguish it as a search command and not a search term...

| set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

clincg
Path Finder

Thanks, that seems to return some results. However, when I changed the "union" to "intersect", it returned 0 results. Is the intersect operation only comparing the fields specified in the " | fields " section, or does it actually compare the whole event entry and only mark it as intersect if the two events are exactly the same?

0 Karma
Get Updates on the Splunk Community!

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...