Splunk Search

How to create a search that will compare field values from two different CSV files using different field names and output outliers to a table?

kgreat
Path Finder

Hello, I have two User List CSV files that I want to compare and find any outliers.

SourceA is called "UserDirectory" which is a CSV file that contains a list of users and their "User Status" with values either "Active" or Disabled". This list is extracted from our Company Active Directory.
Field Names are:
First Name
Last Name
Email Address
User ID
User Status (with field values either "Active" or "Disabled")

SourceB is called "AppUserList" which is a CSV file that contains a list of users from one of our internal Applications that is not integrated with the directory and users are not removed/or disabled from the Application until the Application Admin is notified to do so which can be a huge problem if an employee is terminated and their Manager has not followed all off boarding procedures.
Field Names are:
FirstName
LastName
Email
UserName

The two sources contain the same data for "Email" and "Email Address" just using different field names. Same case with the "User ID" and "UserName". So we should be able to correlate and map user values by using either set of field names.

My challenges have been the following:
- Correlating these two sources that use different field names.
- Knowing if it would be better to use the "UserDirectory" as a source or a lookup table.
- How to setup the search to find Outliers which could have two scenarios.
*Disabled users in Source A "UserDirectory" that have matching field values and still exist in Source B "AppUserList" Or
*Users that don't match in the Source A "UserDirectory" but exist in Source B "AppUserLIst" which could be a rouge account.

So basically, I would like the search to output any of these scenarios to a table so I can kick off an email to the Application Admin to alert them that there are users still existing in the application that are 1) no longer "Active" in the Directory OR 2)Never existed in the directory. Once the admin gets the message they will be able to investigate the discrepancies.

I'll need to setup this similar kind of search for each critical app that is not integrated into the directory or de-provisioned automatically when AD users are disabled in directory. Usually, these are terminated users.

Thank you for your insight..and help!

Tags (1)
0 Karma

vganjare
Builder

Hi,

You can index the UserDirectory records (assuming that its size is small and will not consume too much splunk license). Then, you can set up a scheduled search to take the latest unique records from UserDirectory (i.e. latest unique 70,000 records). This schedule search will have outputlookup command at the end which will generate the lookup dynamically.

Thanks!!

kgreat
Path Finder

Thank you, can you provide an example syntax for how to setup the search? Thanks!

0 Karma

kgreat
Path Finder

If the Directory CSV file changes on a daily basis it there a way for Splunk to injest a new CSV file daily? Or will I need to update the lookup file manually?

0 Karma

kgreat
Path Finder

The CSV file containing the users from the directory has about 70,000 records.

0 Karma

felipesewaybric
Contributor

maybe if you import your csv in Lookup table files (settings/lookups).
Than you can use something like
| inputlookup something.csv

0 Karma

vganjare
Builder

Hi,

What is the size of UserDirectory information? If it is less than 1 Mil records, then lookup can work in your case.

Thanks!!

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!