Solved: Union / Intersect results from different source ty...

clincg · ‎08-04-2010

Hi - I'm trying to union/intersect results from different source type using the SET command:

set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

However, it's not returning any results, even though if I run the search separately, it will return the results that I want.

Does anyone have experience with the SET command? The documentation for the SET command doesn't really have any details explanations. If it's not possible to union/intersect results from different source type, what would be a good way to correlate the data?

Thanks!

bwooden · ‎08-04-2010

The set command needs a leading pipe to distinguish it as a search command and not a search term...

| set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

View solution in original post

Stephen_Sorkin · ‎08-20-2010

This is not an efficient way to solve this problem. With Splunk, it's much, much better to have the work done in a single search pass with an OR than to run two searches and combine the results.

Here you should search:

(sourcetype=first_source 404) OR (sourcetype=second_source 303) | fields url
(sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count by url

Now the efficient version of intersect is very similar. Here we count the number of sourcetypes per url and only keep those urls with more than 1 sourcetype (hence both the sourcetypes):

(sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count dc(sourcetype) as sourcetypes by url | search sourcetypes > 1

These searches can (and will) return an unlimited number of rows, and statistics will be accurate on the entire result set. Using set will only pull a limited number of rows from each search.

thambisetty · ‎08-10-2017

Hi Stephen,

I have two big data sets. one has 300 million records with only one field and its a hash. i have another data set which has 20 million records with 3 fields username and hashes. I would like to compare these two sets to find any hashes are matching in two data sets. is there any effective way to do this except the above two solutions as I have already tried them and they are not effective.

————————————
If this helps, give a like below.

e_sherlock · ‎10-08-2012

Makes sense; however, if the two different sourcetypes have different fields (say "url1" and "url2") that I want to union into one field, what's the best way to do that?

I'm trying to avoid this delimiter solution below if possible...
yoursearchhere |
eval output = field1 + ";" + field2 |
makemv delim=";" output |
mvexpand output

Lowell · ‎08-05-2010

 | set union [search sourcetype="first_source" 404 | fields url | fields - _* ] [search sourcetype="second_source" 303 | fields url | fields - _* ]

It's weird, by by default set will try to compare other hidden fields too, such as _time and _raw which you probably do not want.

The above search should take care of this.

Update: I've mentioned this to the doc team, and they have now noted this on the set documentation page.

bwooden · ‎08-06-2010

Thanks not just for the follow-up on Answers but improved documentation as well!

bwooden · ‎08-04-2010

The set command needs a leading pipe to distinguish it as a search command and not a search term...

| set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

clincg · ‎08-04-2010

Thanks, that seems to return some results. However, when I changed the "union" to "intersect", it returned 0 results. Is the intersect operation only comparing the fields specified in the " | fields " section, or does it actually compare the whole event entry and only mark it as intersect if the two events are exactly the same?

Union / Intersect results from different source type

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Quick connection discovery mode for forwarders

Build and Launch AI Agents from Your Splunk Workflows

Splunk Cloud Application Management in Terraform

Join the Conversation