Solved: How to use join to compare results from a csv file...

queryboy · ‎10-10-2022

this query shows all employees in the company:

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed

I need to compare with a CSV that shows all of the people who did their training, this CSV contains the field Submitted

How would you structure the join to compare if xyz employee is in the CSV of people who completed their training?

yuanliu · ‎10-12-2022

@queryboy The way lookup works is that if the lookup field(s) - in my sample code, Applicant_Email AS primaryWorkEmail, match, the command will output other fields of the matching record. Therefore, presence of other field(s) is used to detect a match. Consequentially, the absence of other field(s) indicates absence of a match.

My sample code assumed that every trainee in that CSV file has a Training_Status (that is not zero length). If this assumption is not true, pick another field that is always populated. For example, if Applicant_Last_Name is always populated,

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Applicant_Last_Name), "not in list", "in list")

In short, your CSV must contain at least one more column (other than Applicant_Email) that is fully populated. If you are uncertain of such a condition in real data, you should add a dummy column to CSV that has a non-zero-length value in every row. (That's what I had done in one extreme condition.) Then, use | eval in_list = if(isnull(dummy), "not in list", "in list").

View solution in original post

queryboy · ‎10-12-2022

@yuanliu you are awesome! I tested with Applicant_Email for the Eval as you mentioned and is working just fine! Thank you!

queryboy · ‎10-12-2022

@yuanliu it gives me all results as not in list, I did this with excel and there are some names that are in list

| eval in_list = if(isnull(Training_Status), "not in list", "in list")

is there a way to do a eval and check not for training status but only if emails in the lookup(csv) are in the main search?

like : | eval in_list = if(Applicant_Email in primaryWorkEmail), "in list", "not in list")

Thank you for your help

yuanliu · ‎10-12-2022

@queryboy The way lookup works is that if the lookup field(s) - in my sample code, Applicant_Email AS primaryWorkEmail, match, the command will output other fields of the matching record. Therefore, presence of other field(s) is used to detect a match. Consequentially, the absence of other field(s) indicates absence of a match.

My sample code assumed that every trainee in that CSV file has a Training_Status (that is not zero length). If this assumption is not true, pick another field that is always populated. For example, if Applicant_Last_Name is always populated,

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Applicant_Last_Name), "not in list", "in list")

In short, your CSV must contain at least one more column (other than Applicant_Email) that is fully populated. If you are uncertain of such a condition in real data, you should add a dummy column to CSV that has a non-zero-length value in every row. (That's what I had done in one extreme condition.) Then, use | eval in_list = if(isnull(dummy), "not in list", "in list").

queryboy · ‎10-11-2022

Thank you for this, do you know how would you structure the query to if any of the Applicant_Email(employe email) is in any of the rows of the primaryWorkEmail(employee email in directory) and basically have a new column saying "in_list" for the ones that have a match and "not in list" for the ones that doesn't have a match? @yuanliu you are helping me a lot is just that one piece I need to get straight

yuanliu · ‎10-12-2022

I am not sure what is being asked. My previous code already added a column named "in_list". It takes values of either "yes" or "no". If you want to change the display string, just change those values.

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Training_Status), "not in list", "in list")

queryboy · ‎10-10-2022

I uploaded the CSV to splunk this is the query:

| inputlookup trainingoct10.csv
| dedup Applicant_Email
| table Applicant_First_Name, Applicant_Last_Name, Applicant_Email, County_Searched, Training_Status

how can I structure my query so it will check for each of the names inside this CSV against my main query? and be able to add a field saying "in list" or "not in list"

sorry I am new to splunk and this is getting difficult for me.
@yuanliu @richgalloway

yuanliu · ‎10-10-2022

how can I structure my query so it will check for each of the names inside this CSV against my main query? and be able to add a field saying "in list" or "not in list"

Can you explain what "not in list"? Specifically, if a user is in main search, but not found in CSV. Is this the definition of "not in list"?

If so, you shouldn't need inputlookup. The command is lookup.

index=EmployeeData AND sourcetype=Directory* earliest=@d
| search NOT Hire_Date IN ("2021-11-11","2021-05-22","2022-08-02", "2021-11-11")
| search Worker !="Level05" Termination_Date="" Training_Performed=""
| table First, Last, primaryWorkEmail, Training_Performed
| lookup trainingoct10.csv Applicant_Email AS primaryWorkEmail
| eval in_list = if(isnull(Training_Status), "no", "yes")

yuanliu · ‎10-10-2022

Set up your CSV as a lookup. (You can do this via Splunk Web.) The search command for matching is lookup. If your CSV contains a column called primaryWorkEmail, it is as simple as

| lookup mylookup pimaryWorkEmail

If, on the other hand, the corresponding column in lookup is eMail, for example, do

| lookup mylookup pimaryWorkEmail as eMail

Hope this helps

richgalloway · ‎10-10-2022

Does the CSV contain only the Submitted field? If so, what are the values of that field (names, email addresses, etc.)? If the CSV contains other fields, what are they? Something is needed to correlate the CSV to the indexed data.

---
If this reply helps you, Karma would be appreciated.

How to use join to compare results from a csv file?

fields

join

lookup

stats

subsearch

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics!

New in Observability Cloud - Explicit Bucket Histograms