I would like to setup an identity lookup for Azure AD user accounts in Splunk ES. It looks like the Microsoft Azure Add-on collects the user data using the Microsoft Azure Active Directory Users input. How can I tell and configure what user attributes will be collected?
@sleclerc1 I saw you had success with getting this going, can you share what attributes you got back for each user?
Thanks.
Hi @mlichtjx !
There are quite a few attributes 😁 and it will likely be different than what we ingest (e.g. we have some sync'ed attributes with our on-premise infrastructure). However, I would recommend that you try using the following tool:
to replicate the API query that the add-on uses to pull user data. You might have to meddle w/ some permissions in your azure environment to ensure that the user making the query has permissions to pull that data from Azure. To find what url the add-on is using, check out the file in the addon's /bin directory (TA-MS-AAD/bin/), named "input_module_MS_AAD_user.py". In our use-case, we modified the query with some filters to exclude certain users.
Once we got the data in (we run it on a 24 hour interval), I wrote a scheduled search that would transform the data into a table into ES-relevant fields, then piped it to "outputlookup" to write it to the CSV that ES leverages for it's identity lookup table.
Hope this helps! Let me know if you have any additional questions!
Hi @mlichtjx !
There are quite a few attributes 😁 and it will likely be different than what we ingest (e.g. we have some sync'ed attributes with our on-premise infrastructure). However, I would recommend that you try using the following tool:
to replicate the API query that the add-on uses to pull user data. You might have to meddle w/ some permissions in your azure environment to ensure that the user making the query has permissions to pull that data from Azure. To find what url the add-on is using, check out the file in the addon's /bin directory (TA-MS-AAD/bin/), named "input_module_MS_AAD_user.py". In our use-case, we modified the query with some filters to exclude certain users.
Once we got the data in (we run it on a 24 hour interval), I wrote a scheduled search that would transform the data into a table into ES-relevant fields, then piped it to "outputlookup" to write it to the CSV that ES leverages for it's identity lookup table.
Hope this helps! Let me know if you have any additional questions!
Thanks @sleclerc1 !
Thanks for these tips! How's the performance impact of this on 600k users? Good point about testing with Graph Explorer. I unpacked the SPL and found that py script to dig into.
We ingest the data via a heavy forwarder (following best practice), and as long as you have decent hardware specs, it shouldn't be that intensive. I would turn debug logs on, however, and ensure that the API call retrieves all the user data you're looking for. In our case, without using a filter, we were ingesting close to 800K users, but logs would state the pagination used by the graph query would fail near the end. Rather than troubleshoot the error, we opted to use filters to bring in the users we cared about the most, and the errors subsided.
Thanks, I am going to check this out.