Hi everyone!
I am new with Splunk and probably this should be really easy for many of you.
I am trying to left join a lookup with a source table.
I tried this initially and it looks great but it's not displaying the total number of records contained in the lookup table. I need to display all records in the lookup and to show all matching records and a blank if not found in table1.
The TempTableLookup.csv (lookup table) just has 1 column called "NUMBER" with 7,500 records.
The table1 has NUMBER, ORIGINDATE and other columns which are not needed. Table1 has 360,000 records.
So I run this query but I get 7,479 instead of the total 7500. There's around 20+ records that do not have an ORIGINDATE or the lookup number does not exist in table1.
index=test sourcetype="table1"
| lookup TempTableLookup.csv NUMBER output NUMBER as matched_number
| where isnotnull(matched_number)
| table NUMBER ORIGINDATE
So I read I need to do an left join so I tried this and it's bringing all 7,500 records I want but it is not bringing back the ORIGINDATE.
Could someone please let me know what I am doing wrong on the second lookup? I know that left joins are not recommended but I cannot think of any other way to give me what I need.
| inputlookup TempTableLookup.csv
| join type=left NUMBER [
search index=test sourcetype="table1"
| dedup NUMBER
| fields NUMBER , ORIGINDATE
]
| table NUMBER ORIGINDATE
The output should look like:
NUMBER ORIGINDATE
123456 01/10/2025
128544 05/05/2029
and so forth...
I'd appreciate greatly any ideas on how to do this.
Thank you in advance and have a great day,
Diana
If I understand correctly, you want all the records from the lookup which contains NUMBER and you want to show ORIGINDATE from the sourcetable data in the index.
You are right in understanding that joins are not the right way to do things in Splunk, stats is the normal way
So, I understand your sourcetable data contains both fields, so, you can do it like this
index=test sourcetype="table1"
| stats count by NUMBER ORIGINDATE
| inputlookup append=t TempTableLookup.csv
| stats values(ORIGINDATE) by NUMBER
the first stats is in case you have multiple events in your sourcetable data for number - you could replace that with
| stats latest(ORIGINDATE) as ORIGINDATE by number
if that is more appropriate for your dataset.
Then the inputlookup appends the lookup table to your existing sourcetable data and then the second stats will "join" the two together around NUMBER, so all the ORIGINDATE values from sourcetable will be combined with all the rows in the lookup, so you end up with ORIGINDATE for those that have values in the data, but empty ORIGINDATE values from the lookup where it was not in the data.
This is often referred to in these forums as proving the negative.
Hope this helps.
If I understand correctly, you want all the records from the lookup which contains NUMBER and you want to show ORIGINDATE from the sourcetable data in the index.
You are right in understanding that joins are not the right way to do things in Splunk, stats is the normal way
So, I understand your sourcetable data contains both fields, so, you can do it like this
index=test sourcetype="table1"
| stats count by NUMBER ORIGINDATE
| inputlookup append=t TempTableLookup.csv
| stats values(ORIGINDATE) by NUMBER
the first stats is in case you have multiple events in your sourcetable data for number - you could replace that with
| stats latest(ORIGINDATE) as ORIGINDATE by number
if that is more appropriate for your dataset.
Then the inputlookup appends the lookup table to your existing sourcetable data and then the second stats will "join" the two together around NUMBER, so all the ORIGINDATE values from sourcetable will be combined with all the rows in the lookup, so you end up with ORIGINDATE for those that have values in the data, but empty ORIGINDATE values from the lookup where it was not in the data.
This is often referred to in these forums as proving the negative.
Hope this helps.
This worked perfectly! thank you so much!!!
Diana
If the solution works, please mark it as a solution, so others can benefit.