Each of the two lookups has URL information.
And I queried it like this:
1)
| set diff
[| inputlookup test.csv]
[| inputlookup test2.csv]
2)
| inputlookup test.csv
| join type=outer url
[| inputlookup test2.csv
| eval is_test2_log=1]
| where isnull(is_test2_log)
The two results are different and the actual correct answer is number 2.
In case 1, there are 200 results, in case 2 there are 300 results.
I don't know why the 2 results are different.
Or even if they are different, shouldn't there be more results from number 1?
Hi @munang,
The set command and the join command perform overlapping but different functions.
set diff returns the symmetric difference of the subsearches:
I.e. set diff returns all events in either subsearch A or subsearch B but not both:
A
url | A_field |
https://www.splunk.com/ | A_value |
https://www.appdynamics.com/ | A_value |
B
url | A_field |
https://www.appdynamics.com/ | A_value |
https://www.cisco.com/ | A_value |
diff
url | A_field |
https://www.splunk.com/ | A_value |
https://www.cisco.com/ | A_value |
Both join type=left and join type=outer perform a left outer join by joining all fields in all events in the base search with all fields from the first (default: max=1) matching event in the subsearch:
I.e.:
A
url | A_field |
https://www.splunk.com/ | A_value |
https://www.appdynamics.com/ | A_value |
B
url | B_field |
https://www.appdynamics.com/ | B_value1 |
https://www.appdynamics.com/ | B_value2 |
join
url | A_field | B_field |
https://www.splunk.com/ | A_value | (null) |
https://www.appdynamics.com/ | A_value | B_value1 |
As written, your join search is equivalent to join type=inner. The where command removes all events from the base search that were not joined to an event in the subsearch.
To return the difference using the join command, the command would need to support a full outer join, and it does not.
As an additional exercise, we can compare diff with combinations of inputlookup. The following searches should return the same results:
A
| set diff
[| inputlookup test.csv ]
[| inputlookup test2.csv ]
B
| inputlookup test.csv where NOT [| inputlookup test2.csv ]
| inputlookup append=t test2.csv where NOT [| inputlookup test.csv ]
Hi @tscroggins and all,
Could you pls check this:
the file http_error_code.csv
StatusCode,Meaning
100,Continue
101,Switching protocols
403,Forbidden
404,Not Found
the file http_error_codes_400.csv
StatusCode,Meaning
400,Bad Request
401,Unauthorized
402,Payment Required
403,Forbidden
404,Not Found
I have not used set diff very often, but the set diff command appears to also look at hidden fields and when doing inputlookup there is a field _mkv_child.
It's possible that this is affecting the results. You could try
| set diff
[ | inputlookup test.csv | fields url | fields - _* ]
[ | inputlookup test2.csv | fields url | fields - _* ]
but you can also do it this way, which is really how lookups are intended to work - it's always good to avoid using join in Splunk - it has limitations and joining on lookups is just not how to do things in Splunk
| inputlookup test2.csv
| lookup test.csv url OUTPUT url as found
| where isnull(found)
So this takes test2 and looks for the urls present in test.csv and retains those not found.
The two methods you provided also produce different results.
Is there a reason why I get different results even if I exclude fields with underscores?
Hi @munang
May i know if Splunk Enterprise or Splunk Cloud and the Splunk version number pls.
Let me try to recreate this situation and see how do we troubleshoot this issue. thanks.