Solved: Re: Splunk TA for O365 – Message Trace input fails...

Kamachi · ‎03-05-2026

Hi there,

We’re seeing consistent ingestion failures with the Message Trace (MT) input in the Splunk Add-on for Microsoft Office 365.
- Authentication to Microsoft 365 succeeds.
- However, every request to the Message Trace endpoint returns HTTP 500, regardless of how small the query window is.
- To rule out an overly large time range, i cloned the MT input and tested with a very small window (e.g., 15 minutes for today’s data). The request still fails with repeated 500 responses.
- Other inputs are OK.

2026-03-05 13:12:36,078 level=ERROR logger=splunk_ta_o365.modinputs.message_trace
datainput="MT_test1" start_time=1772712742
message="HTTP Request error: HTTPSConnectionPool(host='reports.office365.com', port=443): Max retries exceeded with url:
 /ecp/reportingwebservice/reporting.svc/MessageTrace?$filter=StartDate eq datetime'2026-03-05T00:00:00Z' and EndDate eq datetime'2026-03-05T00:15:00Z'
 (Caused by ResponseError('too many 500 error responses'))"
... (stack trace omitted)

Q1: Have you seen this behavior (persistent HTTP 500) from the Message Trace Reporting Webservice endpoint?
Q2: Are there known service-side limitations or tenant-specific issues that can cause this?
Q3: What are the recommended next troubleshooting steps and/or mitigations?
Q4: I know its a legacy MT endpoint as of march 2026 but there is no update from o365 addon and im not feeling to make an input for MT from scratch. Any info when the new update is planned?

Thanks in advance.

SidHeart · ‎03-05-2026

You are not the only one seeing this lately. The legacy Message Trace reporting webservice behind reports.office365.com has been increasingly unreliable, especially now that Microsoft is clearly pushing people toward the newer reporting and Graph based endpoints. When the service starts returning repeated HTTP 500 responses, it is often not related to the query window size at all. In many tenants the issue comes from the backend service itself or throttling behaviors that are not properly surfaced through the legacy endpoint.

A few things that are worth checking:

Verify if Message Trace works normally in the Exchange Admin Center for the same time window. If the EAC trace also fails or returns delayed results, that usually confirms it is a backend service issue rather than Splunk.
Check the Service Health Dashboard in Microsoft 365. Microsoft occasionally posts advisories related to message trace delays or reporting pipeline issues.
Since the endpoint you are hitting is part of the older reporting webservice, many teams are gradually switching to Graph based reporting APIs or newer audit pipelines. Unfortunately, that does mean some existing integrations like the Splunk TA inputs lag behind.
As a temporary mitigation, some admins reduce polling frequency and expand the query window slightly. This sometimes avoids repeated retries against the service.

One other practical angle some organizations are taking right now is moving the message tracking and reporting workloads into a fresh Microsoft 365 tenant or environment when they are already planning infrastructure changes. During those transitions we have seen teams rely on migration utilities that can move mailboxes, metadata, and permissions cleanly while they rebuild monitoring integrations on the new tenant.

For example, tools like SysTools Office 365 Migration Tool are often used during tenant restructuring or consolidation projects so admins can migrate mailbox data without depending on the reporting pipeline that sometimes causes these MT issues.

Hopefully Splunk updates the TA soon with support for the newer endpoints. Until then, confirming whether the failure also occurs in EAC and checking service health are probably the fastest ways to determine if this is tenant side or service side.

View solution in original post

fadeedbeef

As @SidHeartsuggested the issue is caused to instability on the old reporting endpoint.
Version 6.0.0 onwards of the splunk-add-on-for-microsoft-office-365 TA now introduces a new Ms Graph based message trace input method that should resolve this issue.

New Sourcetype: o365:graph:messagetrace

TA Documentation describing the update:
#https://splunk.github.io/splunk-add-on-for-microsoft-office-365/MigrationGuides/UpdateMessageTraceInput

Kamachi

Migrated to the new endpoint as soon as it was available. The problem occured before MS released the new MT endpoint.

SidHeart · ‎03-05-2026

You are not the only one seeing this lately. The legacy Message Trace reporting webservice behind reports.office365.com has been increasingly unreliable, especially now that Microsoft is clearly pushing people toward the newer reporting and Graph based endpoints. When the service starts returning repeated HTTP 500 responses, it is often not related to the query window size at all. In many tenants the issue comes from the backend service itself or throttling behaviors that are not properly surfaced through the legacy endpoint.

A few things that are worth checking:

Verify if Message Trace works normally in the Exchange Admin Center for the same time window. If the EAC trace also fails or returns delayed results, that usually confirms it is a backend service issue rather than Splunk.
Check the Service Health Dashboard in Microsoft 365. Microsoft occasionally posts advisories related to message trace delays or reporting pipeline issues.
Since the endpoint you are hitting is part of the older reporting webservice, many teams are gradually switching to Graph based reporting APIs or newer audit pipelines. Unfortunately, that does mean some existing integrations like the Splunk TA inputs lag behind.
As a temporary mitigation, some admins reduce polling frequency and expand the query window slightly. This sometimes avoids repeated retries against the service.

One other practical angle some organizations are taking right now is moving the message tracking and reporting workloads into a fresh Microsoft 365 tenant or environment when they are already planning infrastructure changes. During those transitions we have seen teams rely on migration utilities that can move mailboxes, metadata, and permissions cleanly while they rebuild monitoring integrations on the new tenant.

For example, tools like SysTools Office 365 Migration Tool are often used during tenant restructuring or consolidation projects so admins can migrate mailbox data without depending on the reporting pipeline that sometimes causes these MT issues.

Hopefully Splunk updates the TA soon with support for the newer endpoints. Until then, confirming whether the failure also occurs in EAC and checking service health are probably the fastest ways to determine if this is tenant side or service side.

Splunk TA for O365 – Message Trace input fails with repeated HTTP 500

administration

configuration

troubleshooting

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Why Splunk Customers Should Attend Cisco Live 2026 Las Vegas

What Is the Name of the USB Key Inserted by Bob Smith? (BOTS Hint, Not the Answer)

Automating Threat Operations and Threat Hunting with Recorded Future

Join the Conversation