You are not the only one seeing this lately. The legacy Message Trace reporting webservice behind reports.office365.com has been increasingly unreliable, especially now that Microsoft is clearly pushing people toward the newer reporting and Graph based endpoints. When the service starts returning repeated HTTP 500 responses, it is often not related to the query window size at all. In many tenants the issue comes from the backend service itself or throttling behaviors that are not properly surfaced through the legacy endpoint. A few things that are worth checking: Verify if Message Trace works normally in the Exchange Admin Center for the same time window. If the EAC trace also fails or returns delayed results, that usually confirms it is a backend service issue rather than Splunk. Check the Service Health Dashboard in Microsoft 365. Microsoft occasionally posts advisories related to message trace delays or reporting pipeline issues. Since the endpoint you are hitting is part of the older reporting webservice, many teams are gradually switching to Graph based reporting APIs or newer audit pipelines. Unfortunately, that does mean some existing integrations like the Splunk TA inputs lag behind. As a temporary mitigation, some admins reduce polling frequency and expand the query window slightly. This sometimes avoids repeated retries against the service. One other practical angle some organizations are taking right now is moving the message tracking and reporting workloads into a fresh Microsoft 365 tenant or environment when they are already planning infrastructure changes. During those transitions we have seen teams rely on migration utilities that can move mailboxes, metadata, and permissions cleanly while they rebuild monitoring integrations on the new tenant. For example, tools like SysTools Office 365 Migration Tool are often used during tenant restructuring or consolidation projects so admins can migrate mailbox data without depending on the reporting pipeline that sometimes causes these MT issues. Hopefully Splunk updates the TA soon with support for the newer endpoints. Until then, confirming whether the failure also occurs in EAC and checking service health are probably the fastest ways to determine if this is tenant side or service side.
... View more