Splunk Search

How to use join to compare results from a csv file?

queryboy
Explorer

this query shows all employees in the company:

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed

I need to compare with a CSV that shows all of the people who did their training, this CSV contains the field Submitted

How would you structure the join to compare if xyz employee is in the CSV of people who completed their training?

 

Labels (5)
0 Karma
1 Solution

yuanliu
SplunkTrust
SplunkTrust

@queryboy The way lookup works is that if the lookup field(s) - in my sample code, Applicant_Email AS primaryWorkEmail, match, the command will output other fields of the matching record.  Therefore, presence of other field(s) is used to detect a match.  Consequentially, the absence of other field(s) indicates absence of a match.

My sample code assumed that every trainee in that CSV file has a Training_Status (that is not zero length).  If this assumption is not true, pick another field that is always populated.  For example, if Applicant_Last_Name is always populated,

 

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Applicant_Last_Name), "not in list", "in list")

 

In short, your CSV must contain at least one more column (other than Applicant_Email) that is fully populated.  If you are uncertain of such a condition in real data, you should add a dummy column to CSV that has a non-zero-length value in every row. (That's what I had done in one extreme condition.) Then, use | eval in_list = if(isnull(dummy), "not in list", "in list")

View solution in original post

queryboy
Explorer

@yuanliu you are awesome! I tested with Applicant_Email for the Eval as you mentioned and is working just fine! Thank you!

0 Karma

queryboy
Explorer

@yuanliu  it gives me all results as not in list, I did this with excel and there are some names that are in list 

| eval in_list = if(isnull(Training_Status), "not in list", "in list")

is there a way to do a eval and check not for training status but only if emails in the lookup(csv) are in the main search?

like : | eval in_list = if(Applicant_Email in primaryWorkEmail), "in list", "not in list") 

Thank you for your help

0 Karma

yuanliu
SplunkTrust
SplunkTrust

@queryboy The way lookup works is that if the lookup field(s) - in my sample code, Applicant_Email AS primaryWorkEmail, match, the command will output other fields of the matching record.  Therefore, presence of other field(s) is used to detect a match.  Consequentially, the absence of other field(s) indicates absence of a match.

My sample code assumed that every trainee in that CSV file has a Training_Status (that is not zero length).  If this assumption is not true, pick another field that is always populated.  For example, if Applicant_Last_Name is always populated,

 

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Applicant_Last_Name), "not in list", "in list")

 

In short, your CSV must contain at least one more column (other than Applicant_Email) that is fully populated.  If you are uncertain of such a condition in real data, you should add a dummy column to CSV that has a non-zero-length value in every row. (That's what I had done in one extreme condition.) Then, use | eval in_list = if(isnull(dummy), "not in list", "in list")

queryboy
Explorer

Thank you for this, do you know how would you structure the query to if any of the Applicant_Email(employe email) is in any of the rows of the primaryWorkEmail(employee email in directory) and basically have a new column saying "in_list" for the ones that have a match and "not in list" for the ones that doesn't have a match? @yuanliu  you are helping me a lot is just that one piece I need to get straight 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

I am not sure what is being asked.  My previous code already added a column named "in_list".  It takes values of either "yes" or "no".  If you want to change the display string, just change those values.

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Training_Status), "not in list", "in list")

 

0 Karma

queryboy
Explorer

I uploaded the CSV to splunk this is the query:

| inputlookup trainingoct10.csv
| dedup Applicant_Email
| table Applicant_First_Name, Applicant_Last_Name, Applicant_Email, County_Searched, Training_Status

how can I structure my query so it will check for each of the names inside this CSV against my main query? and be able to add a field saying "in list" or "not in list"

sorry I am new to splunk and this is getting difficult for me.  
@yuanliu @richgalloway 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

how can I structure my query so it will check for each of the names inside this CSV against my main query? and be able to add a field saying "in list" or "not in list"

Can you explain what "not in list"?  Specifically, if a user is in main search, but not found in CSV.  Is this the definition of "not in list"?

If so, you shouldn't need inputlookup.  The command is lookup.

 

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Training_Status), "no", "yes")

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Set up your CSV as a lookup. (You can do this via Splunk Web.)  The search command for matching is lookup.  If your CSV contains a column called primaryWorkEmail, it is as simple as

| lookup mylookup pimaryWorkEmail

If, on the other hand, the corresponding column in lookup is eMail, for example, do

| lookup mylookup pimaryWorkEmail as eMail

Hope this helps

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Does the CSV contain only the Submitted field?  If so, what are the values of that field (names, email addresses, etc.)?  If the CSV contains other fields, what are they?  Something is needed to correlate the CSV to the indexed data.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...