Hi - I'm trying to union/intersect results from different source type using the SET command:
set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]
However, it's not returning any results, even though if I run the search separately, it will return the results that I want.
Does anyone have experience with the SET command? The documentation for the SET command doesn't really have any details explanations. If it's not possible to union/intersect results from different source type, what would be a good way to correlate the data?
The set command needs a leading pipe to distinguish it as a search command and not a search term...
| set union [search sourcetype="first_source" 404 | fields url] [search sourcetype="second_source" 303 | fields url]
Thanks, that seems to return some results. However, when I changed the "union" to "intersect", it returned 0 results. Is the intersect operation only comparing the fields specified in the " | fields " section, or does it actually compare the whole event entry and only mark it as intersect if the two events are exactly the same?
| set union [search sourcetype="first_source" 404 | fields url | fields - _* ] [search sourcetype="second_source" 303 | fields url | fields - _* ]
It's weird, by by default set will try to compare other hidden fields too, such as
_raw which you probably do not want.
The above search should take care of this.
Update: I've mentioned this to the doc team, and they have now noted this on the set documentation page.
This is not an efficient way to solve this problem. With Splunk, it's much, much better to have the work done in a single search pass with an OR than to run two searches and combine the results.
Here you should search:
(sourcetype=first_source 404) OR (sourcetype=second_source 303) | fields url (sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count by url
Now the efficient version of intersect is very similar. Here we count the number of sourcetypes per url and only keep those urls with more than 1 sourcetype (hence both the sourcetypes):
(sourcetype=first_source 404) OR (sourcetype=second_source 303) | stats count dc(sourcetype) as sourcetypes by url | search sourcetypes > 1
These searches can (and will) return an unlimited number of rows, and statistics will be accurate on the entire result set. Using
set will only pull a limited number of rows from each search.
Makes sense; however, if the two different sourcetypes have different fields (say "url1" and "url2") that I want to union into one field, what's the best way to do that?
I'm trying to avoid this delimiter solution below if possible...
eval output = field1 + ";" + field2 |
makemv delim=";" output |
I have two big data sets. one has 300 million records with only one field and its a hash. i have another data set which has 20 million records with 3 fields username and hashes. I would like to compare these two sets to find any hashes are matching in two data sets. is there any effective way to do this except the above two solutions as I have already tried them and they are not effective.