Splunk Search

Left Join: How to display missing data in a table using Join?

asveturi
Path Finder

Hi There,

I have a requirement where i have an index with two different sources.

index=a sourcetype=a1

index=a sourcetype=a2

Now i have a column in common between these two sourcetypes. (ex: corrlId).

I want to display those records which are in source type a1 but not in a2.

Would some one tell how to achieve this?

 

my rough query which i am working on is this:

index=a sourcetype=a1
| search "*" trackrequest
| eval EDT_time = strftime(_time ,"%Y-%m-%d %H:%M:%S")
| rename a.corrlId as CorrlID, EDT_time as "TimeStamp1"
| join type=left correlId
[search index=a sourcetype=a2
| search "*" trackrequest
| eval EDT_time = strftime(_time ,"%Y-%m-%d %H:%M:%S")
| rename a.corrlId as CorrlID, EDT_time as "TimeStamp2"
]
| table "TimeStamp1", CorrlID, "TimeStamp2"

 

For my query a single record is repeating n number of times in output with out actually giving me the desired result which is giving all distinct missing values.

 

 

 

 

 

 

 

Labels (1)
0 Karma
1 Solution

ownion
Path Finder

Hi @asveturi,

this is my answer form what I've understood.

You have two subsets a1 and a2 and you want to join the two based on the Correlation ID and retrieve all the records that are in a1 but not in a2. I think you can go this way.

index=a sourcetype=a1 <your other filters>
| rename a.corrlId as CorrlID
| table CorrlID <your other fields>
| join type=left CorrlID 
[index=a sourcetype=a2 <your other filters> | rename a.corrlId as CorrlID | table CorrlID <your other fields> | eval present = "yes"]
| where isnull(present)

 so in this way you will only have the Correlation IDs from a1 that are not in a2.

Here the documentation for the "join" command.

View solution in original post

ownion
Path Finder

Hi @asveturi,

this is my answer form what I've understood.

You have two subsets a1 and a2 and you want to join the two based on the Correlation ID and retrieve all the records that are in a1 but not in a2. I think you can go this way.

index=a sourcetype=a1 <your other filters>
| rename a.corrlId as CorrlID
| table CorrlID <your other fields>
| join type=left CorrlID 
[index=a sourcetype=a2 <your other filters> | rename a.corrlId as CorrlID | table CorrlID <your other fields> | eval present = "yes"]
| where isnull(present)

 so in this way you will only have the Correlation IDs from a1 that are not in a2.

Here the documentation for the "join" command.

asveturi
Path Finder

Worked like a charm. Thank you for quick help with query.

0 Karma

bowesmana
SplunkTrust
SplunkTrust

Note that using join in Splunk is often not the best/right solution. Yes it can work, but when it does not, you will not know that it has not joined all the data. It uses subsearches, which have limitations that can affect the join.

Notably the join can not return more than 50,000 results or take longer than 60 seconds by default. In case it does, you will not get all the results from the subsearch.

@ITWhisperer example show how you typically do joins in Splunk using stats, which does not have those limitations.

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

you should read this https://community.splunk.com/t5/Splunk-Search/What-is-the-relation-between-the-Splunk-inner-left-joi.... There are described all those joins and how to do those in Splunk SPL.

r. Ismo

asveturi
Path Finder

Appreciate your quick response, Thank you!.

Just wanted to put few more points based on my requirement, I spent lot of time but still could not follow correct steps.

1. I will get '*'  which is for all correlation id in last 24 hrs or any time frame set in splunk dashboard or a particular correlation id as input from the top level text box,

2. My query has to be more generic which should accept to query for all correlation ids(*) or

a particular corrl id(123) and has to go for a search in two index source types and has to return the list of missing ids which are in a1 but not in a2.

Though below is the query i am using based on suggestions, i am still not able to display in tabluar format with missing correlation ids, also i need to put the timestamp from a1. Below is the query and could you please suggest further.

 

index=qa_source sourcetype=app_log OR sourcetype=temp_log
| search "*" trackingInfo
| stats values(_time) as _time values(sourcetype) as sourcetype by trackInfo.correlationId
| where mvcount(sourcetype)=1 AND sourcetype="app_log"
| table trackInfo.correlationId

Note: trackInfo.correlationId is a json format hence used that way.

Appreciate your help and response here. Just checking back since a good amount of time is spent and now running out of time.

 

 

 

 

 

 

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Is trackInfo.correlationId a field that has already been extracted?

index=qa_source sourcetype=app_log OR sourcetype=temp_log
| search trackInfo.correlationId="*"
| stats values(_time) as _time values(sourcetype) as sourcetype by trackInfo.correlationId
| where mvcount(sourcetype)=1 AND sourcetype="app_log"

If not, perhaps you could share some sanitised events in a code block </> and identify which fields have already been extracted?

0 Karma

asveturi
Path Finder

trackInfo is not extracted yet, what i was trying to do is to pass the "$correlation_ID$" that i get from input text box to
"| search "$correlation_ID$" trackingInfo"[here it substitutes to either * for getting all the ids or specific correlation ID] like specified in my 2nd post.


The json event look like below

 

{

\"trackingInfo\":
{\"correlationID\":\"101010\",
\"components\":
[{
{
\"name\":\"test1\",
\"executionTimeInMillis\":\"10\",
\"receivedTimeStamp\":\"2022-08-12\",
\"publishedTimeStamp\":\"2022-08-12\"
},
{
\"name\":\"test2\",
\"executionTimeInMillis\":\"10\",
\"receivedTimeStamp\":\"2022-08-12\",
\"publishedTimeStamp\":\"2022-08-12\"
}
]
}
}

 

in both app_log & temp_log sourcetype.

All i have to do is join only based on correlationID between app_log & temp_log.
find out the missing correlationID in temp_log,but present in app_log and display the correlationID in a table format.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
index=qa_source sourcetype=app_log OR sourcetype=temp_log "$correlation_ID$" trackingInfo
| spath trackingInfo.correlationId
| stats values(_time) as _time values(sourcetype) as sourcetype by trackingInfo.correlationId
| where mvcount(sourcetype)=1 AND sourcetype="app_log"
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust
index=a sourcetype=a1 OR sourcetype=a2 trackrequest
| stats values(_time) as _time values(sourcetype) as sourcetype by correlId
| where mvcount(sourcetype)=1 AND sourcetype="a1"
0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

 (view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...