I've been tasked with developing my organization's asset and identity lookups for Splunk ES.
I am using managed lookups to store the data and have a report run once a day to refresh the lookups with new data that may have been ingested from our AD index.
The issue I am running into is that every time the report runs, the values in my lookup tables are duplicating where there are multiple values stored. For example, a device may have the categories a and b, but the next time the report runs it is listed as having the categories a, b, and ab, and the following day a, b, and abab with this cycle repeating forever. I have attempted to use the dedup and mvdedup function between the inputlookup and outputlookup statements but it does not work. I have pasted a sanitized version of the identity lookup generator report's SPL below:
index=azuread sourcetype=azure:aad:user
| eval bunit=<bunit eval statement>
| eval alias=onPremisesSamAccountName
| eval priority=if(MATCH(userType,"Member"),"medium", priority)
| eval priority=if(MATCH(userType,"Guest"),"low", priority)
| eval category=if(MATCH(userType,"Member"), mvappend("Member",category), category)
| eval category=if(MATCH(userType,"Guest"), mvappend("Guest",category), category)
| eval category=if(LIKE(jobTitle,"%Chief%"), mvappend("Management",category), category)
| eval category=if(LIKE(jobTitle,"%Chief%"), mvappend("Executive",category), category)
| eval category=if(LIKE(jobTitle,"%Manager%"), mvappend("Management",category), category)
| eval identity=mvdedup(lower(mvappend(alias,mail)))
| replace null WITH ""
| inputlookup ad_identities append=t
| stats count values(category) as category, values(mail) as email, values(department) as department, values(displayName) as nick, values(jobTitle) as title, values(bunit) as bunit, values(mobilePhone) as phone, values(givenName) as first, values(surname) as last, values(city) as work_city, values(lastPasswordChangeDateTime) as lastPasswordChangeDateTime, values(alias) as alias, values(identity) as identity, values(priority) as priority by userPrincipalName
| eval bunit=mvdedup(bunit)
| eval category=mvdedup(category)
| fields - count
| outputlookup ad_identities
Any advice would be greatly appreciated.
When you run outputlookup on a row with multivalue fields, the lookup will flatten the multi value fields to space (or comma - I forget) separated values.
If you want the lookup file to preserve MV then you must use
| outputlookup ad_identities output_format=splunk_mv_csv
which will preserve MV fields.
You also do not need to use stats count and then remove count. Just don't use count in your stats command
When you run outputlookup on a row with multivalue fields, the lookup will flatten the multi value fields to space (or comma - I forget) separated values.
If you want the lookup file to preserve MV then you must use
| outputlookup ad_identities output_format=splunk_mv_csv
which will preserve MV fields.
You also do not need to use stats count and then remove count. Just don't use count in your stats command
This worked! I just had to make sure to clear the fields that had duplicating issues in the managed lookup editor before running the report. Thanks!