Splunk Search

Left join a table and a lookup to show matching and non-matching results

Diana_a
Explorer

Hi everyone!

I am new with Splunk and probably this should be really easy for many of you. 

I am trying to left join a lookup with a source table.

I tried this initially and it looks great but it's not displaying the total number of records contained in the lookup table. I need to display all records in the lookup and to show all matching records and a blank if not found in table1.
The TempTableLookup.csv (lookup table) just has 1 column called "NUMBER" with 7,500 records.
The table1 has NUMBER, ORIGINDATE and other columns which are not needed. Table1 has 360,000 records.

So I run this query but I get 7,479 instead of the total 7500. There's around 20+ records that do not have an ORIGINDATE or the lookup number does not exist in table1.

index=test sourcetype="table1"
| lookup TempTableLookup.csv NUMBER output NUMBER as matched_number
| where isnotnull(matched_number) 
| table NUMBER ORIGINDATE



So I read I need to do an left join so I tried this and it's bringing all 7,500 records I want but it is not bringing back the ORIGINDATE.



Could someone please let me know what I am doing wrong on the second lookup? I know that left joins are not recommended but I cannot think of any other way to give me what I need.

| inputlookup TempTableLookup.csv
| join type=left NUMBER [
search index=test sourcetype="table1"
| dedup NUMBER
| fields NUMBER , ORIGINDATE
]
| table NUMBER ORIGINDATE


The output should look like:

NUMBER     ORIGINDATE
123456       01/10/2025
128544       05/05/2029
and so forth...

I'd appreciate greatly any ideas on how to do this.


Thank you in advance and have a great day,

Diana

 

 

 

Labels (4)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

If I understand correctly, you want all the records from the lookup which contains NUMBER and you want to show ORIGINDATE from the sourcetable data in the index.

You are right in understanding that joins are not the right way to do things in Splunk, stats is the normal way

So, I understand your sourcetable data contains both fields, so, you can do it like this

index=test sourcetype="table1"
| stats count by NUMBER ORIGINDATE
| inputlookup append=t TempTableLookup.csv
| stats values(ORIGINDATE) by NUMBER 

 the first stats is in case you have multiple events in your sourcetable data for number - you could replace that with 

| stats latest(ORIGINDATE) as ORIGINDATE by number

if that is more appropriate for your dataset.

Then the inputlookup appends the lookup table to your existing sourcetable data and then the second stats will "join" the two together around NUMBER, so all the ORIGINDATE values from sourcetable will be combined with all the rows in the lookup, so you end up with ORIGINDATE for those that have values in the data, but empty ORIGINDATE values from the lookup where it was not in the data.

This is often referred to in these forums as proving the negative.

Hope this helps.

View solution in original post

bowesmana
SplunkTrust
SplunkTrust

If I understand correctly, you want all the records from the lookup which contains NUMBER and you want to show ORIGINDATE from the sourcetable data in the index.

You are right in understanding that joins are not the right way to do things in Splunk, stats is the normal way

So, I understand your sourcetable data contains both fields, so, you can do it like this

index=test sourcetype="table1"
| stats count by NUMBER ORIGINDATE
| inputlookup append=t TempTableLookup.csv
| stats values(ORIGINDATE) by NUMBER 

 the first stats is in case you have multiple events in your sourcetable data for number - you could replace that with 

| stats latest(ORIGINDATE) as ORIGINDATE by number

if that is more appropriate for your dataset.

Then the inputlookup appends the lookup table to your existing sourcetable data and then the second stats will "join" the two together around NUMBER, so all the ORIGINDATE values from sourcetable will be combined with all the rows in the lookup, so you end up with ORIGINDATE for those that have values in the data, but empty ORIGINDATE values from the lookup where it was not in the data.

This is often referred to in these forums as proving the negative.

Hope this helps.

Diana_a
Explorer

This worked perfectly! thank you so much!!! 

 

 

Diana

0 Karma

bowesmana
SplunkTrust
SplunkTrust

If the solution works, please mark it as a solution, so others can benefit.

 

0 Karma
Get Updates on the Splunk Community!

Congratulations to the 2025-2026 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Community Feedback

We Want to Hear from You! Share Your Feedback on the Splunk Community   The Splunk Community is built for you ...

Manual Instrumentation with Splunk Observability Cloud: Implementing the ...

In our observability journey so far, we've built comprehensive instrumentation for our Worms in Space ...