Splunk Search

Comparing Two Files with Eval and Set

pradeepk_splunk
Splunk Employee
Splunk Employee

Hello! I started using Splunk about 3 hours ago and am getting stuck on something that may be very simple:

I have two csv files:

fullname.txt. Contents of file:

full_name, date_of_birth
Homer Simpson, 01-01-1980
Bart Simpson, 02-02-2009
Apu Nahasapeemapetilon, 03-03-1984
Montgomery Burns, 06-06-1881

db_export.csv. Contents of file:

id, first_name, last_name, create_date
001, HoM er, siMpson, 01-01-2018
002, bART, Simp son, 07-01-2018
003, ApU , Nahasapeemapetilon , 09-03-2018
004, Montgomery, bUrns, 11-14-2018

What I'm trying to do: Compare full_name in fullname.txt to first_name+last_name in db_export.csv. Note that there are capitalization and spacing issues in the db_export csv, so I'm attempting to use eval to 'normalize' the fields by creating new values that contain lowercased letters, then comparing those.

Here's what I have so far:

source="fullname.txt" OR source="db-export.csv" | eval spring_full_name = full_name, simps_full_name = first_name." ".last_name, spring_low_name = lower(full_name), simps_low_name = lower(simps_full_name) | set intersect *... this is about the point where I'm getting lost*

I'd also like to concatenate the eval'd full_name field and compare that to the concatenated first_name+last_name eval'd field, due to the spacing issues in db_export.csv. Finally, I would like to have one search show all of the matches and another search show the non-matches(or combined into one search, if possible). Ultimately, I would also like to display a report that contains the full_name and date_of_birth of fullname.txt combined with the id and create_date of db_export.csv

Thanks in advance!

0 Karma
1 Solution

sdchakraborty
Contributor

Hi,

Can you try the below search, Please note that I indexed those files into my main index. I also assumed that in your db export you have only space issues. If any other special character issue is there into db export we may need to handle it through regular expression. Also I assumed your db_export is the master data so the below query will also return names which are present in db export but not in fullname.txt.

index=main source="db_export.csv"
| table id, first_name, last_name, create_date
| eval fullname_formatted = lower(replace(first_name," ","")).lower(replace(last_name," ",""))
| join type="left" fullname_formatted 
    [search index=main source="fullname.csv"
    | table full_name, date_of_birth
    | eval fullname_formatted = lower(replace(full_name," ",""))]

Sid

View solution in original post

sdchakraborty
Contributor

Hi,

Can you try the below search, Please note that I indexed those files into my main index. I also assumed that in your db export you have only space issues. If any other special character issue is there into db export we may need to handle it through regular expression. Also I assumed your db_export is the master data so the below query will also return names which are present in db export but not in fullname.txt.

index=main source="db_export.csv"
| table id, first_name, last_name, create_date
| eval fullname_formatted = lower(replace(first_name," ","")).lower(replace(last_name," ",""))
| join type="left" fullname_formatted 
    [search index=main source="fullname.csv"
    | table full_name, date_of_birth
    | eval fullname_formatted = lower(replace(full_name," ",""))]

Sid

pradeepk_splunk
Splunk Employee
Splunk Employee

Wow...fantastic! That is exactly what I was looking for! Thank you Sid.

0 Karma

sdchakraborty
Contributor

no worries. Have a great day 🙂

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...