Splunk Search

Inner Join Returns Different Results When Reversing Search Order

ravencr0ss
New Member

Been working on a proof of concept that seems to be eluding me. From my work with SQL I would expect that an Inner Join would return the same results regardless of which search is the primary and which is the sub. However, this doesn't seem to be the case. Using the first query returns approximately 600 events, whereas if the searches are flipped to the second query it returns more than 3000 events. Am I missing something or is this just an oddity of how joins really work in Splunk?

Query 1 - Approximately 600 results.

sourcetype=mcafee:protection signature=7058
| join type=inner host [search sourcetype=XmlWinEventLog:Security]
| table host, signature_name, Name

Query 2 - More than 3000 results.

sourcetype=XmlWinEventLog:Security
| join type=inner host [search sourcetype=mcafee:protection signature=7058 ]
| table host, signature_name, Name

I have tried both constraining and opening the max values returned, constraining the time, and specifying specific fields in each query to no avail.

0 Karma
1 Solution

DMohn
Motivator

You are misunderstanding the way the join works here...

An inner join doesn't necessarily have to be a 1:1 join, but can be a n:1 or 1:n join - if set so in the options (see parameter max in the join reference: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join)

So I assume, your first primary search (base search AV data) returns about 600 results, for which the subsearch will return more than one result per host field - but only the first one will be matched, as this is the default setting for the max parameter.
The second primary search however returns around 3000 (or maybe more) results, and it will be a n:1 match, as you may have several entries per host in your WinEventLog which will be joined with the same McAfee log.

Try using join type=inner max=0 and see if the numer of results will be equal - which should be the case if my assumption is correct.

View solution in original post

0 Karma

DMohn
Motivator

You are misunderstanding the way the join works here...

An inner join doesn't necessarily have to be a 1:1 join, but can be a n:1 or 1:n join - if set so in the options (see parameter max in the join reference: https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Join)

So I assume, your first primary search (base search AV data) returns about 600 results, for which the subsearch will return more than one result per host field - but only the first one will be matched, as this is the default setting for the max parameter.
The second primary search however returns around 3000 (or maybe more) results, and it will be a n:1 match, as you may have several entries per host in your WinEventLog which will be joined with the same McAfee log.

Try using join type=inner max=0 and see if the numer of results will be equal - which should be the case if my assumption is correct.

0 Karma

ravencr0ss
New Member

I tried again with the max=0 and as before the results between the two are out of wack, but now I understand why. Originally I was under the impression the match criteria was on a 1:1 not 1:n. Thank you for the insight.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...