Solved: SOLVED: Alternates of LEFT OUTER JOIN

madhav_dholakia · ‎02-03-2022

Hello,

I have got 2 data sets resides in same index but with different source/host:

index="tickets" host="RMM_DATA"

index="tickets" source="fs_webhooks"

first data set contains multiple ticket fields, including ID. the second data set only contains one field ID.

I want to display all the events where ID value is available only in the 1st data set but not in the 2nd data set.

The below query gives me LEFT JOIN - but I want to get IDs (and related fields) those only exist in the first data set.

index="tickets" host="RMM_DATA"
| sort 0 -_time
| dedup ID
| where DepartmentName!="XYZ" AND DepartmentName!="MNO" AND Status!="Closed" AND Status!="Resolved" AND Priority="Urgent" AND Type="Incident"
| table ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| join type=left ID [search index="tickets" source="fs_webhooks"
| rename freshdesk_webhook.ticket_id as ID
| sort 0 -_time
| dedup ID 
| table ID]
| table ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"

Can you please suggest how I can achieve this?

Thank you.

gcusello · ‎02-03-2022

Hi @madhav_dholakia,

sorry I misunderstood that you wanted to join the searches,

in this case you could run something like this:

index="tickets" (host="RMM_DATA" DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" Status!="Resolved" Priority="Urgent" Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| eval ID=coalesce(ticket_id, ID), check=if(source="fs_webhooks","fs_webhooks", "other")
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * dc(check) AS dc_check values(check) AS check BY ID
| search dc_check=1 check="other"

In few words I added the check that the ID is present only in the first search and not in the second,

Ciao.

Giuseppe

View solution in original post

isoutamo · ‎02-03-2022

Hi

here https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi... is how to avoid use command join when you are joining data sets.

r. Ismo

madhav_dholakia · ‎02-03-2022

Thank you very much, @isoutamo for this reference - it is very useful.

gcusello · ‎02-03-2022

Hi @madhav_dholakia,

only one question: are you sure that events with source="fs_webhook" arent still in the host="RMM_DATA" subset?

If you have two differen data subsets, you could use the stats command that's ver faster than join (Splunk isn't a DB, so use join command only when you haven't any other solution!).

So please try something like this:

index="tickets" (host="RMM_DATA" DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" Status!="Resolved" Priority="Urgent" Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| eval ID=coalesce(ticket_id, ID)
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * BY ID

Ciao.

Giuseppe

madhav_dholakia · ‎02-03-2022

Hi @gcusello,

yes, events with source="fs_webhook" arent in the host="RMM_DATA" subset. I have tried this with COALESCE as you suggested, but it gives me IDs even if it is in both the data sets.

index="tickets" (host="RMM_DATA") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| where DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" AND Status!="Resolved" Priority="Urgent" Type="Incident"
| eval ID=coalesce(ticket_id, ID)
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * BY ID

Below example is what I want to achieve:

Data Set 1:

ID: 123456, 234567, 345678

Data Set 2:

ID: 123456, 345678

Expected Result:

ID: 234567

Thank you.

gcusello · ‎02-03-2022

Hi @madhav_dholakia,

sorry I misunderstood that you wanted to join the searches,

in this case you could run something like this:

index="tickets" (host="RMM_DATA" DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" Status!="Resolved" Priority="Urgent" Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| eval ID=coalesce(ticket_id, ID), check=if(source="fs_webhooks","fs_webhooks", "other")
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * dc(check) AS dc_check values(check) AS check BY ID
| search dc_check=1 check="other"

In few words I added the check that the ID is present only in the first search and not in the second,

Ciao.

Giuseppe

madhav_dholakia · ‎02-03-2022

Hi @gcusello - with some minor changes in the updated query you shared, I got it working and now getting expected data. Thank you so much.

gcusello · ‎02-03-2022

Hi @madhav_dholakia,

good for you, see next time!

Ciao and happy splunking.

Giuseppe

P.S.: Karma Points are appreciated by all the Contributors 😉

PickleRick · ‎02-03-2022

There is also additional question whether ID values are really unique. And if there are IDs that are in the second set and are not in the first one.

In the simplest case (the sets are disjunctive and the second set is a subset of the first one in terms of ID values), you can just use count to discard the repetitive values

index=tickets host=RMM_DATA or source=fs_webhooks
| stats count values(*) as * by ID
| where count=1

madhav_dholakia · ‎02-03-2022

Hi @PickleRick,

yes, so the first data set is all the tickets raised in an ITSM tool and the second data set is all the tickets marked as SPAM (subset of data set 1).

Both the data sets contain the common field ID but is is not names as ID in the second data set (freshdesk_webhook.ticket_id) so I use "| rename"

Dataset 1 Columns:

ID, DepartmentName Status Priority Type etc.

Dataset 2 Column:

ID

index="freshservice_tickets" (host="PMC_RAPTOR_DATA" "Department Name"!="ABC" AND "Department Name"!="DEF" AND Status!="Closed" AND Status!="Resolved" AND Priority="Urgent" AND Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id as ID
| stats count values(*) as * by ID

and the above doesn't give the required results. Can you please suggest what I am doing wrong?

Thank you.

PickleRick · ‎02-03-2022

Well, before filtering with the |where command you should get all your tickets.

But in case of your ITSM tickets which are not spam the count should be 1 and in case of the tickets which are spam the count should be 2. Hence the additional condition with |where.

If that's not what you need, check your base search first.

index="freshservice_tickets" (host="PMC_RAPTOR_DATA" "Department Name"!="ABC" AND "Department Name"!="DEF" AND Status!="Closed" AND Status!="Resolved" AND Priority="Urgent" AND Type="Incident") | stats count values(*) as * by ID

It should give you your tickets in tabelarized form. (and you should count equal to 1 across the whole table if as you say the ID is unique per ticket)

madhav_dholakia · ‎02-03-2022

thank you very much, @PickleRick - before I check your comment, I got this working by referring to @gcusello's comment. but thanks for this, I am sure will use this sometime very soon.

SOLVED: Alternates of LEFT OUTER JOIN

join

subsearch

table

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life