All Apps and Add-ons

Does the Azure AD add on retrieve the complete set of sign-in records?

raoul
Path Finder

On the basis of the data I see from our tenant the add on is not retrieving all of the sign in records when compared with the Azure Portal sign in page.

The number of records loaded appears correlated with the polling frequency set. I have tried 300s (5m), 600s (10m) and 900s (15m). In each case the number of underlying events that the add on loads appears different. The effect is quite marked.

alt text

Query for the chart above:

index=liquid_it sourcetype="ms:aad:signin"  
|  timechart span=5m count 
| eval tpm=round(count / 5, 2) 
| fields - count
0 Karma
1 Solution

raoul
Path Finder

I set up an alternate ingest pipeline: AAD --> Event Hub --> Azure function --> Splunk HEC

That reliably produces a full set of the events in the graph the new ingest is "aad_audit" and the reporting add-in is shown as "ms:aad:signin". The difference is quite marked. alt text

View solution in original post

0 Karma

jijulukose
Explorer

Here's a simple fix to the app if developer is watching this thread - in the api call add '+and+signinDateTime+le+(current time - delay minutes)'. So the new filter query will look like:
&$filter=signinDateTime+gt+(check point time)+and+signinDateTime+le+(current time - delay minutes)

With delay minutes set to 5, this will get 99% of the data considering MS's less than 5 minute latency for 99% of events. And if you're making the change, please let the user control the delay minutes.

raoul
Path Finder

I set up an alternate ingest pipeline: AAD --> Event Hub --> Azure function --> Splunk HEC

That reliably produces a full set of the events in the graph the new ingest is "aad_audit" and the reporting add-in is shown as "ms:aad:signin". The difference is quite marked. alt text

0 Karma

swong2
Path Finder

This Microsoft article https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/reference-reports-latenci... talks about the latency for sign-ins and audit logs in Azure. The latency is between 2 to 5 mins. My understand would be that the logs will be available in Azure portal (also ready for the API to pull) within 5 mins of the originating event. So I think setting the polling frequency to >300s should be OK. However, I have concern about this Add-on using the largest siginDateTime/activityDateTime seen during the query as the checkpoint timestamp. My reasoning is that Azure logs may come in different order, and we will miss some events came in late but their originating event timestamps are before the checkpoint.

I have the following scenario in mind:

  1. My Signins Input starts at 1:10pm (with polling interval 10 mins) and the current checkpoint is 1:00pm
  2. 1st input/query ran and pulling logs from 1:00pm to 1:10pm. The Add-on set the largest siginDataTime as the checkpoint. (Let’s say the largest signin time seen from the query is 1:07pm, now the checkpoint is 1:07pm)
  3. If I have a originating sign-in event happened at 1:06pm but this log is not made available until 1:11pm (5 mins delay). So my 1st query that ran at 1:10pm missed this log and that’s OK as I will expect the next query will pick it up.
  4. Now at 1:20pm my 2nd input ran. This query however just pulled log from 1:07pm (current checkpoint) to 1:20pm. At this point, my 1:06pm sign-in event is going to be skipped.

As suggested by jconger, the "Azure Monitor Add-on for Splunk" may be the better way to collect near real time from the an Event Hub.

FYI... I have been trying to collect Azure AD logs (sign-in, audit), Azure AD risk events, as well as Office 365 logs into Splunk. I feel in general the latencies in the Microsoft reporting infrastructure causing lots of confusion/issues on how we can properly schedule our data ingestion without incomplete/duplicate data problem. It makes it harder to use the data for near real time monitoring/reporting solution.

0 Karma

raoul
Path Finder

Completely agree with your statement:

the latencies in the Microsoft reporting infrastructure causing lots of confusion/issues on how we can properly schedule our data ingestion without incomplete/duplicate data problem

0 Karma

jconger
Splunk Employee
Splunk Employee

Version 1.0.3 of the Azure AD Reporting Add-on has some data collection improvements that should address your issue. Also Azure AD logs can be sent to Event Hubs now. The Azure Monitor Add-on for Splunk can be used to collect them from an Event Hub.

0 Karma

raoul
Path Finder

Thanks for the suggestion, will try this out. Have set up the event hub and can see activity. Will be interesting to do a side-by-side and see if I get a more complete set of events via this route than the API-based reporting add-on.

0 Karma

raoul
Path Finder

Ok, the results are in and on this basis I can see that the Azure AD Reporting Add-on is missing events.

I set up an alternate ingest pipeline: AAD --> Event Hub --> Azure function --> Splunk HEC

That reliably produces more events than the reporting add-on.

0 Karma

raoul
Path Finder

Ok, so there is some relationship between the frequency of polling and how many events get ingested. The more frequent the polling the fewer events (the more missing events). The less frequent the polling the more events get ingested for any given period.

I changed the polling from 300s to 600s and the number of events per minute went up by a factor of 3.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Have you tried the Microsoft cloud services app? It may do what you’re looking for too.

0 Karma

raoul
Path Finder

Thanks, will try that. Initially I did not think it did the sign-ins, but on closer reading it may do.

0 Karma

raoul
Path Finder

Some further details:

  • Splunk Enterprise 7.0.2
  • Set the AAD Reporting add-on to retrieve every 300s
  • After 24 hours of running am still only getting a subset of the audit records
  • Can discern no pattern to the missing events; no obvious time boundary issue, no attribute of the events not present in splunk that stands out

Needless to say this is a deal-breaker. If the audit in Splunk is not complete it is all but useless.

Not sure how to progress in diagnosing this.

0 Karma

raoul
Path Finder

Changed the polling frequency to 600s to see if that makes a difference.

0 Karma
Get Updates on the Splunk Community!

How to Monitor Google Kubernetes Engine (GKE)

We’ve looked at how to integrate Kubernetes environments with Splunk Observability Cloud, but what about ...

Index This | How can you make 45 using only 4?

October 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

Splunk Education Goes to Washington | Splunk GovSummit 2024

If you’re in the Washington, D.C. area, this is your opportunity to take your career and Splunk skills to the ...