Splunk Search

Why are the results of set diff and join different?

munang
Path Finder

Each of the two lookups has URL information.
And I queried it like this:

 

1) 

| set diff
[| inputlookup test.csv]
[| inputlookup test2.csv]

 

2) 

| inputlookup test.csv
| join type=outer url
    [| inputlookup test2.csv
     | eval is_test2_log=1]
| where isnull(is_test2_log)

 

 

The two results are different and the actual correct answer is number 2.


In case 1, there are 200 results, in case 2 there are 300 results.

I don't know why the 2 results are different.

Or even if they are different, shouldn't there be more results from number 1?

 

Labels (2)

tscroggins
Influencer

Hi @munang,

The set command and the join command perform overlapping but different functions.

set diff returns the symmetric difference of the subsearches:

tscroggins_0-1733070960903.png

I.e. set diff returns all events in either subsearch A or subsearch B but not both:

A

urlA_field
https://www.splunk.com/A_value
https://www.appdynamics.com/A_value

 

B

urlA_field
https://www.appdynamics.com/A_value
https://www.cisco.com/A_value

 

diff

urlA_field
https://www.splunk.com/A_value
https://www.cisco.com/A_value

 

Both join type=left and join type=outer perform a left outer join by joining all fields in all events in the base search with all fields from the first (default: max=1) matching event in the subsearch:

tscroggins_2-1733071373069.png

I.e.:

A

urlA_field
https://www.splunk.com/A_value
https://www.appdynamics.com/A_value

 

B

urlB_field
https://www.appdynamics.com/B_value1
https://www.appdynamics.com/B_value2

 

join

urlA_fieldB_field
https://www.splunk.com/A_value(null)
https://www.appdynamics.com/A_valueB_value1

 

As written, your join search is equivalent to join type=inner. The where command removes all events from the base search that were not joined to an event in the subsearch.

To return the difference using the join command, the command would need to support a full outer join, and it does not.

tscroggins
Influencer

As an additional exercise, we can compare diff with combinations of inputlookup. The following searches should return the same results:

A

| set diff
[| inputlookup test.csv ]
[| inputlookup test2.csv ]

B

| inputlookup test.csv where NOT [| inputlookup test2.csv ]
| inputlookup append=t test2.csv where NOT [| inputlookup test.csv ]

inventsekar
SplunkTrust
SplunkTrust

Hi @tscroggins and all,
Could you pls check this:

the file http_error_code.csv

StatusCode,Meaning
100,Continue
101,Switching protocols
403,Forbidden
404,Not Found

the file http_error_codes_400.csv

StatusCode,Meaning
400,Bad Request
401,Unauthorized
402,Payment Required
403,Forbidden
404,Not Found

 

join2.PNGset-diff.PNG

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !

bowesmana
SplunkTrust
SplunkTrust

I have not used set diff very often, but the set diff command appears to also look at hidden fields and when doing inputlookup there is a field _mkv_child.

It's possible that this is affecting the results. You could try

| set diff 
[ | inputlookup test.csv | fields url | fields - _* ] 
[ | inputlookup test2.csv | fields url | fields - _* ]

but you can also do it this way, which is really how lookups are intended to work - it's always good to avoid using join in Splunk - it has limitations and joining on lookups is just not how to do things in Splunk

| inputlookup test2.csv
| lookup test.csv url OUTPUT url as found
| where isnull(found)

So this takes test2 and looks for the urls present in test.csv and retains those not found.

 

munang
Path Finder

@bowesmana 

 

The two methods you provided also produce different results.

Is there a reason why I get different results even if I exclude fields with underscores?

0 Karma

inventsekar
SplunkTrust
SplunkTrust

Hi @munang 

May i know if Splunk Enterprise or Splunk Cloud and the Splunk version number pls. 

Let me try to recreate this situation and see how do we troubleshoot this issue. thanks. 

thanks and best regards,
Sekar

PS - If this or any post helped you in any way, pls consider upvoting, thanks for reading !
0 Karma

munang
Path Finder

@inventsekar 

 

Sorry for the late reply.

My splunk enterprise version is 9.1.0.2.

 

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...