Hi, the purpose of my search would be to compare two sources (they are both CSV files) to ensure there are no disabled users that exist from Source A that are found in Source B (a list of users from the application). If there are any matches I'd like to output the list of users to a table. The goal is to make sure the application does not still contain orphaned users that have been disabled from the directory. The problem is that some applications still maintain a local list of users rather than leveraging a single source of users as to why this type of comparison is sometimes required.
Source A (Lists All users from the directory and their status)
Field Names:
User ID
User Status  (with values either Active or Disabled)
First Name
Last Name
Source B (Lists all users found maintained in an application)
Username
First Name
Last Name
The two different sources use different field names for the user account but it's the same values. Source A uses "Username" and Source B "uses" User ID"
Looking for the most efficient way to compare the two sources. Any users found would be considered orphaned users. We then would notify the application owner so they can disable/delete/remove the account from the application.
Thank you!
 
		
		
		
		
		
	
			
		
		
			
					
		Try this:
source=A OR source=B | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1
That's taking the fields from both sources, throws it into one user field and then counts the number of sources for each user. Everyone with more than one source is kept in the results.
 
		
		
		
		
		
	
			
		
		
			
					
		Try this:
source=A OR source=B | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1
That's taking the fields from both sources, throws it into one user field and then counts the number of sources for each user. Everyone with more than one source is kept in the results.
 
		
		
		
		
		
	
			
		
		
			
					
		If the source has both enabled and disabled users then you'll want to filter for only disabled users like this:
(source="All Directory Users Status Rpt.csv" "User Status"="Disabled") OR source="Siloed Application User List.csv" | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1
That worked! I'm so happy...time to celebrate!!! Thank you Martin!
What if I wanted to compare all the Active users in source="All Directory Users Status Rpt.csv to ensure there's a matching "User ID" for each "Username" in source="Siloed Application User List.csv" and for any "User ID" not found/matched pipe to a table.
This search would ensure two things - no Orphaned accounts and also no accounts exist with different usernames.
 
		
		
		
		
		
	
			
		
		
			
					
		That means all users are present in both sources because the distinct count of source values by user is greater than one.
All users are at least listed in the first source either as "User Status"=Disabled or "User Status"="Active" with an associated "User ID". Then the second source contains a small number of those users. The goal of the search is to verify that nobody from the first source with "User Status"=Disabled shows up in the second list. And if there are any Disabled users then pipe those usernames/user IDs to a table.
 
		
		
		
		
		
	
			
		
		
			
					
		Yup, my search would yield that user. If it doesn't, take a step back and run this:
source="All Directory Users Status Rpt.csv" OR source="Siloed Application User List.csv" | eval user = coalesce(Username, 'User ID')
Check it has one entry from source A and one entry from source B, where both entries have user=obad.
Hi Martin, I'm still getting back every user in both lists which is a total of 62, 317 users with both lists combined.
It appears by using the following search command that I'm just adding both sources together which is not result I was looking for.
source="All Directory Users Status Rpt.csv" OR source="Siloed Application User List.csv" | eval user = coalesce(Username, 'User ID') | stats dc(source) as dc by user | where dc > 1
As a test I inserted one Disabled User ID/Username in both sources so the ideal results should return at least this one bad user.
Here are the Fields and values from source="All Directory Users Status Rpt.csv"
"User ID"=obad
"First Name"=Bad
"Last Name"=Orphan 
"Termination Date"=3/18/15 21:46
"User Status"=Disabled
Here are the Fields and values from source="Siloed Application User List.csv"
Username=obad
"First Name"=Bad
"Last Name"=Orphan
Because the "User Status"=Disabled in the source="All Directory Users Status Rpt.csv" and there is a matching Username in the source="Siloed Application User List.csv" the results should identify the existing user from source="Siloed Application User List.csv".
I hope that makes more sense. Thank you!
