Splunk Search

How to join two searches, consider first and not second (subsearch)?

rodrigobortolon
New Member

I tried to use the NOT command to get the events from the first search but not in the second (subsearch) but in the results, I noticed events from the second search (subsearch). Is that a different way to do this search? I tried to use join type=left and the same issue occurred not bringing the events only regarding the first search not in the second search.

with NOT:

index=ABC sourcetype="perf" CLASS=PL | fields RR_ID USR_ID | dedup RR_ID
    | table tblresult RR_ID USR_ID
    | search NOT[search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID| table tblresult RR_ID]
    | stats count by USR_ID

with Left join:

index=ABC sourcetype="perf" CLASS=PL| fields RR_ID USR_ID | dedup RR_ID
      | join type=left RR_ID [search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID]
      | stats count by USR_ID
0 Karma

diogofgm
SplunkTrust
SplunkTrust

Your 1st search was nearly there I believe:
Try this:

index=ABC sourcetype="perf" CLASS=PL NOT [search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID| fields RR_ID]
| dedup RR_ID
| stats count by USR_ID
------------
Hope I was able to help you. If so, an upvote would be appreciated.
0 Karma

rodrigobortolon
New Member

Hi @diogofgm , this approach didn't work for me. It brings the values from second one as well. I am trying to check the first approach with left join and inner join that @Sukisen1981 provide us here. Anyway, thank you!

0 Karma

Sukisen1981
Champion

what if we place another join?
index=ABC sourcetype="perf" CLASS=PL| fields RR_ID USR_ID | dedup RR_ID
| join type=left RR_ID [search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID] | join type=inner RR_ID [search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID]
| stats count by USR_ID

Does the inner one show the common fields and the outer one show only fields from the first search?
I have not tried but what I am expecting is this should give you 2 columns , first with only the first search and then with the common values, see if it is easy after this to exclude the RR_IDs which are present in both columns

0 Karma

rodrigobortolon
New Member

Hi Sukisen1981! Thank you for your reply. I implemented here and apparently works well but I split the searchs to understand why I need to use a new join (inner) after the left join. I thought that the left join will bring only the results in first search not in the second but brings me everything. The query whitout the join inner I try to bring the columns from first search and second but as the reference column is RR_ID, could I rename and bring in distinct columns to check that in the second search I will have values and other raws with no value?

0 Karma

Sukisen1981
Champion

hi @rodrigobortolon
No, the left join does not do what you think, please refer to the documentation (the venn diagrams) here -
https://docs.splunk.com/Documentation/Splunk/7.3.1/SearchReference/Join

A left join gives (to use set notation) B complement + A intersection B = Theoretically A , since B complement is nothing but all values of A, except A intersection B and we of course ADD A intersection B to that=Unique values of A + common values
An Inner join gives only A intersection B.
So when your first left join gets performed, what you get is ALL values of the first index AND no unique values from the second.
when your second join gets performed you get ONLY values unique to both A&B (A intersection B).
So my idea was if you can remove those RR_IDs AFTER performing both the joins which are common to both AND WILL BE PRESENT in the output of the first, left join - you got to be left with only values UNIQUE to to the first index...
And yes - you can rename the rrID in the second, inner join just like you did
This problem can also be solved with stats, but I don't want to go down that route for two reasons - I feel you are near the solution + your 'join' fundas are getting unjoined and all clear : )

0 Karma

rodrigobortolon
New Member

@Sukisen1981 , thank you again for our reply! So let me get if I understand. This part (original from my question)

index=ABC sourcetype="perf" CLASS=PL| fields RR_ID USR_ID | dedup RR_ID
| join type=left RR_ID [search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID]

will bring me ALL values of the first index AND no unique values from the second, correct?
The second join that you mentioned "when your second join gets performed you get ONLY values unique to both A&B (A intersection B).", you mean with this additional part (join type=inner) after the left join?

index=ABC sourcetype="perf" CLASS=PL| fields RR_ID USR_ID | dedup RR_ID
| join type=left RR_ID [search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID] | join type=inner RR_ID [search index=CAB sourcetype=CC | fields rrId| rename rrId AS RR_ID]
| stats count by USR_ID

I am still in this part thinking that with this additional join I will get as you mention the A intersection B, not the final part that you mentioned with only the left (join left results).

0 Karma

Sukisen1981
Champion

hi @rodrigobortolon
You are correct,
I suggest running the 2 joins separately first and validating that we get all events from the first index(including common events in both indexes) for the left join and then validating that we only receive A intersection B events from the inner join
If this works, can you then run the whole query as it is and confirm how the result looks like?
It might be worthwhile to rename as follows:
left join - same as it is
inner join - rename first rrid from main query as something else, rrd_1 and also use the same field rename in inner join
PS- Do you have access to _audit and _internal indexes? I am struggling a bit to better visualize the requirement, and we can kind of perhaps mock this scenario using those 2 indexes

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!