Hello,
I have got 2 data sets resides in same index but with different source/host:
index="tickets" host="RMM_DATA"
index="tickets" source="fs_webhooks"
first data set contains multiple ticket fields, including ID. the second data set only contains one field ID.
I want to display all the events where ID value is available only in the 1st data set but not in the 2nd data set.
The below query gives me LEFT JOIN - but I want to get IDs (and related fields) those only exist in the first data set.
index="tickets" host="RMM_DATA"
| sort 0 -_time
| dedup ID
| where DepartmentName!="XYZ" AND DepartmentName!="MNO" AND Status!="Closed" AND Status!="Resolved" AND Priority="Urgent" AND Type="Incident"
| table ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| join type=left ID [search index="tickets" source="fs_webhooks"
| rename freshdesk_webhook.ticket_id as ID
| sort 0 -_time
| dedup ID
| table ID]
| table ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
Can you please suggest how I can achieve this?
Thank you.
Hi @madhav_dholakia,
sorry I misunderstood that you wanted to join the searches,
in this case you could run something like this:
index="tickets" (host="RMM_DATA" DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" Status!="Resolved" Priority="Urgent" Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| eval ID=coalesce(ticket_id, ID), check=if(source="fs_webhooks","fs_webhooks", "other")
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * dc(check) AS dc_check values(check) AS check BY ID
| search dc_check=1 check="other"
In few words I added the check that the ID is present only in the first search and not in the second,
Ciao.
Giuseppe
Hi
here https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi... is how to avoid use command join when you are joining data sets.
r. Ismo
Thank you very much, @isoutamo for this reference - it is very useful.
Hi @madhav_dholakia,
only one question: are you sure that events with source="fs_webhook" arent still in the host="RMM_DATA" subset?
If you have two differen data subsets, you could use the stats command that's ver faster than join (Splunk isn't a DB, so use join command only when you haven't any other solution!).
So please try something like this:
index="tickets" (host="RMM_DATA" DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" Status!="Resolved" Priority="Urgent" Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| eval ID=coalesce(ticket_id, ID)
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * BY ID
Ciao.
Giuseppe
Hi @gcusello,
yes, events with source="fs_webhook" arent in the host="RMM_DATA" subset. I have tried this with COALESCE as you suggested, but it gives me IDs even if it is in both the data sets.
index="tickets" (host="RMM_DATA") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| where DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" AND Status!="Resolved" Priority="Urgent" Type="Incident"
| eval ID=coalesce(ticket_id, ID)
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * BY ID
Below example is what I want to achieve:
Data Set 1:
ID: 123456, 234567, 345678
Data Set 2:
ID: 123456, 345678
Expected Result:
ID: 234567
Thank you.
Hi @madhav_dholakia,
sorry I misunderstood that you wanted to join the searches,
in this case you could run something like this:
index="tickets" (host="RMM_DATA" DepartmentName!="XYZ" DepartmentName!="MNO" Status!="Closed" Status!="Resolved" Priority="Urgent" Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id AS ticket_id
| eval ID=coalesce(ticket_id, ID), check=if(source="fs_webhooks","fs_webhooks", "other")
| fields ID Type DepartmentName "Created Date" Location Priority Subject Queue Status Analyst "Last Updated"
| stats values(*) AS * dc(check) AS dc_check values(check) AS check BY ID
| search dc_check=1 check="other"
In few words I added the check that the ID is present only in the first search and not in the second,
Ciao.
Giuseppe
Hi @gcusello - with some minor changes in the updated query you shared, I got it working and now getting expected data. Thank you so much.
Hi @madhav_dholakia,
good for you, see next time!
Ciao and happy splunking.
Giuseppe
P.S.: Karma Points are appreciated by all the Contributors 😉
There is also additional question whether ID values are really unique. And if there are IDs that are in the second set and are not in the first one.
In the simplest case (the sets are disjunctive and the second set is a subset of the first one in terms of ID values), you can just use count to discard the repetitive values
index=tickets host=RMM_DATA or source=fs_webhooks
| stats count values(*) as * by ID
| where count=1
Hi @PickleRick,
yes, so the first data set is all the tickets raised in an ITSM tool and the second data set is all the tickets marked as SPAM (subset of data set 1).
Both the data sets contain the common field ID but is is not names as ID in the second data set (freshdesk_webhook.ticket_id) so I use "| rename"
Dataset 1 Columns:
ID, DepartmentName Status Priority Type etc.
Dataset 2 Column:
ID
index="freshservice_tickets" (host="PMC_RAPTOR_DATA" "Department Name"!="ABC" AND "Department Name"!="DEF" AND Status!="Closed" AND Status!="Resolved" AND Priority="Urgent" AND Type="Incident") OR (source="fs_webhooks")
| rename freshdesk_webhook.ticket_id as ID
| stats count values(*) as * by ID
and the above doesn't give the required results. Can you please suggest what I am doing wrong?
Thank you.
Well, before filtering with the |where command you should get all your tickets.
But in case of your ITSM tickets which are not spam the count should be 1 and in case of the tickets which are spam the count should be 2. Hence the additional condition with |where.
If that's not what you need, check your base search first.
index="freshservice_tickets" (host="PMC_RAPTOR_DATA" "Department Name"!="ABC" AND "Department Name"!="DEF" AND Status!="Closed" AND Status!="Resolved" AND Priority="Urgent" AND Type="Incident") | stats count values(*) as * by ID
It should give you your tickets in tabelarized form. (and you should count equal to 1 across the whole table if as you say the ID is unique per ticket)
thank you very much, @PickleRick - before I check your comment, I got this working by referring to @gcusello's comment. but thanks for this, I am sure will use this sometime very soon.