Getting Data In

How to compare a list of disabled users from Source A to a list of application users from Source B, both with different field names for the user account?

kgreat
Path Finder

Hi, the purpose of my search would be to compare two sources (they are both CSV files) to ensure there are no disabled users that exist from Source A that are found in Source B (a list of users from the application). If there are any matches I'd like to output the list of users to a table. The goal is to make sure the application does not still contain orphaned users that have been disabled from the directory. The problem is that some applications still maintain a local list of users rather than leveraging a single source of users as to why this type of comparison is sometimes required.

Source A (Lists All users from the directory and their status)
Field Names:
User ID
User Status (with values either Active or Disabled)
First Name
Last Name

Source B (Lists all users found maintained in an application)
Username
First Name
Last Name

The two different sources use different field names for the user account but it's the same values. Source A uses "Username" and Source B "uses" User ID"

Looking for the most efficient way to compare the two sources. Any users found would be considered orphaned users. We then would notify the application owner so they can disable/delete/remove the account from the application.

Thank you!

1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Try this:

source=A OR source=B | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1

That's taking the fields from both sources, throws it into one user field and then counts the number of sources for each user. Everyone with more than one source is kept in the results.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

Try this:

source=A OR source=B | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1

That's taking the fields from both sources, throws it into one user field and then counts the number of sources for each user. Everyone with more than one source is kept in the results.

martin_mueller
SplunkTrust
SplunkTrust

If the source has both enabled and disabled users then you'll want to filter for only disabled users like this:

(source="All Directory Users Status Rpt.csv" "User Status"="Disabled") OR source="Siloed Application User List.csv" | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1

kgreat
Path Finder

That worked! I'm so happy...time to celebrate!!! Thank you Martin!

0 Karma

kgreat
Path Finder

What if I wanted to compare all the Active users in source="All Directory Users Status Rpt.csv to ensure there's a matching "User ID" for each "Username" in source="Siloed Application User List.csv" and for any "User ID" not found/matched pipe to a table.

This search would ensure two things - no Orphaned accounts and also no accounts exist with different usernames.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

That means all users are present in both sources because the distinct count of source values by user is greater than one.

0 Karma

kgreat
Path Finder

All users are at least listed in the first source either as "User Status"=Disabled or "User Status"="Active" with an associated "User ID". Then the second source contains a small number of those users. The goal of the search is to verify that nobody from the first source with "User Status"=Disabled shows up in the second list. And if there are any Disabled users then pipe those usernames/user IDs to a table.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Yup, my search would yield that user. If it doesn't, take a step back and run this:

source="All Directory Users Status Rpt.csv" OR source="Siloed Application User List.csv" | eval user = coalesce(Username, 'User ID')

Check it has one entry from source A and one entry from source B, where both entries have user=obad.

0 Karma

kgreat
Path Finder

Hi Martin, I'm still getting back every user in both lists which is a total of 62, 317 users with both lists combined.

0 Karma

kgreat
Path Finder

It appears by using the following search command that I'm just adding both sources together which is not result I was looking for.

source="All Directory Users Status Rpt.csv" OR source="Siloed Application User List.csv" | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1

As a test I inserted one Disabled User ID/Username in both sources so the ideal results should return at least this one bad user.

Here are the Fields and values from source="All Directory Users Status Rpt.csv"
"User ID"=obad
"First Name"=Bad
"Last Name"=Orphan
"Termination Date"=3/18/15 21:46
"User Status"=Disabled

Here are the Fields and values from source="Siloed Application User List.csv"
Username=obad
"First Name"=Bad
"Last Name"=Orphan

Because the "User Status"=Disabled in the source="All Directory Users Status Rpt.csv" and there is a matching Username in the source="Siloed Application User List.csv" the results should identify the existing user from source="Siloed Application User List.csv".

I hope that makes more sense. Thank you!

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...