Getting Data In

Union / Intersect results from different source type

clincg
Path Finder

Hi - I'm trying to union/intersect results from different source type using the SET command:

set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

However, it's not returning any results, even though if I run the search separately, it will return the results that I want.

Does anyone have experience with the SET command? The documentation for the SET command doesn't really have any details explanations. If it's not possible to union/intersect results from different source type, what would be a good way to correlate the data?

Thanks!

Tags (1)
0 Karma
1 Solution

bwooden
Splunk Employee
Splunk Employee

The set command needs a leading pipe to distinguish it as a search command and not a search term...

| set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

View solution in original post

Stephen_Sorkin
Splunk Employee
Splunk Employee

This is not an efficient way to solve this problem. With Splunk, it's much, much better to have the work done in a single search pass with an OR than to run two searches and combine the results.

Here you should search:

(sourcetype=first_source 404) OR (sourcetype=second_source 303) | fields url
(sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count by url

Now the efficient version of intersect is very similar. Here we count the number of sourcetypes per url and only keep those urls with more than 1 sourcetype (hence both the sourcetypes):

(sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count dc(sourcetype) as sourcetypes by url | search sourcetypes > 1 

These searches can (and will) return an unlimited number of rows, and statistics will be accurate on the entire result set. Using set will only pull a limited number of rows from each search.

thambisetty
SplunkTrust
SplunkTrust

Hi Stephen,

I have two big data sets. one has 300 million records with only one field and its a hash. i have another data set which has 20 million records with 3 fields username and hashes. I would like to compare these two sets to find any hashes are matching in two data sets. is there any effective way to do this except the above two solutions as I have already tried them and they are not effective.

————————————
If this helps, give a like below.
0 Karma

e_sherlock
Explorer

Makes sense; however, if the two different sourcetypes have different fields (say "url1" and "url2") that I want to union into one field, what's the best way to do that?

I'm trying to avoid this delimiter solution below if possible...
yoursearchhere |
eval output = field1 + ";" + field2 |
makemv delim=";" output |
mvexpand output

0 Karma

Lowell
Super Champion
 | set union [search sourcetype="first_source" 404 | fields url | fields - _* ] [search sourcetype="second_source" 303 | fields url | fields - _* ]

It's weird, by by default set will try to compare other hidden fields too, such as _time and _raw which you probably do not want.

The above search should take care of this.


Update: I've mentioned this to the doc team, and they have now noted this on the set documentation page.

bwooden
Splunk Employee
Splunk Employee

Thanks not just for the follow-up on Answers but improved documentation as well!

0 Karma

bwooden
Splunk Employee
Splunk Employee

The set command needs a leading pipe to distinguish it as a search command and not a search term...

| set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]

clincg
Path Finder

Thanks, that seems to return some results. However, when I changed the "union" to "intersect", it returned 0 results. Is the intersect operation only comparing the fields specified in the " | fields " section, or does it actually compare the whole event entry and only mark it as intersect if the two events are exactly the same?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...