Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and Troubleshooting in Real-Time

CaitlinHalla · ‎06-05-2025

This is the second post in our Splunk Observability Cloud’s AI Assistant in Action series, in which we look at how to use the Splunk AI Assistant by digging into some practical, real-world, real-time examples. This series explores the specific use cases of:

Identifying unknown unknowns
Analyzing and troubleshooting in real-time
Auditing compliance and cost
Explaining metrics or providing feedback
Onboarding new hires or new users of Splunk Observability Cloud
Observability as Code

In our first post in this series, we saw how to access the Splunk AI Assistant and then explored how we could identify unknown unknowns within our service environment. The AI Assistant identified that our latest release could be causing high error rates and high latency. Before jumping into this second post, check out that first post: Identifying Unknown Unknowns.

This second post picks up where we left off and jumps into using the Splunk AI Assistant to:

Perform deploy and release checks
Investigate database query performance
Interrogate active detectors and alerts

Deploy and Release Checks

We can use the AI Assistant to compare new code releases to previous versions. Let’s see this in action by asking the AI Assistant to compare the current latency of our Payment service to the prior version. We’ll ask it to compare Payment service performance for the current and previous versions in our Online Boutique environment and identify changes in behavior or error patterns for the latest release:

The AI Assistant returns a comparison that shows the latest version of our service, version 350.10, has a much higher latency than the previous version, version 350.9:

Scrolling down to the error rate analysis, we can see version 350.10 has an error rate of over 30,000 errors, while version 350.9 has 0 errors. The analysis also shows that the common error for the latest version is “Invalid request with HTTP status code 401”:

The AI Assistant also highlights observations and recommendations for further investigation and resolution of the issue:

The Observations section shows that the latency increase and error rate surge are significant and indicate “potential issues with the latest release.”

The Recommendations include ideas for further investigation, analysis of code changes between versions, and rolling back to version 350.9 to maintain service stability.

Analyze Trace Errors

While investigating real-time issues, looking at specific trace data is helpful to get a deeper understanding of the root cause. We can use the AI Assistant to analyze trace errors for this troubleshooting in real-time use case. Here, we ask the AI Assistant to analyze trace errors specifically for our Payment service in our Online Boutique environment over the last 15 minutes:

It’s important to note that being more specific with your AI Assistant queries – for example, specifying services, environments, time windows, and data types will help the assistant return more specific and useful responses.

In the response for our current prompt, we can see a list of trace errors for the Payment service with dynamic links to each of the traces:

We can then ask for a deeper analysis of traces by prompting the AI Assistant with a specific trace ID:

The AI Assistant identifies the root cause of the errors as a 401 status code or unauthorized access to one of our third-party APIs: the ButtercupPayments service. The response includes the error message and error tags for the invalid request, and also an analysis of trace performance:

Scrolling down in the response, we can see the Performance Issues section that provides helpful information about the duration of the spans within our specific trace. The duration for this trace was relatively short, which hints at a pretty immediate rejection of the request. This section also highlights the service interaction, which explains how the data is requested. In this case, our Payment service is trying to request the Buttercup Payments API and failing due to this authorization issue:

Looking at the recommendations, the AI Assistant suggests reviewing the authentication mechanisms for the Buttercup Payments API to ensure the credentials or tokens are valid, along with a couple of other recommendations:

Selecting the hyperlink at the end of this response will take us to the trace in Splunk APM so we can look at the data in detail and explore further:

Database Query Performance

Not only can we ask the AI Assistant to analyze traces and metrics related to our services, but we can also ask it to analyze database performance and investigate database queries. We can ask detailed questions like, “What are the top 5 worst performing database queries in the online boutique environment?”

In the response to this question, we get a list of the database queries that have high latency, along with the number of times those queries were executed:

Detector and Alert Insights

The Splunk AI Assistant can also provide insight into detectors and alerts. If we navigate to Detectors & SLOs in Splunk Observability Cloud and select critical alerts, we can see we have an active high error rate alert for our Payment service:

If we select that alert and scroll down in the alert window, we can get the Incident ID of this specific alert:

We can copy that Incident ID and then use that ID to ask the AI Assistant to explain this specific alert and tell us why it was triggered. In the response, the AI Assistant provides a summary of the alert details along with an explanation of the triggering conditions and why they were triggered:

We started investigating this specific alert by navigating to the Detectors & SLOs page view in Splunk Observability Cloud, but we could have also queried the AI Assistant to get information about active or frequently triggered detectors. The response to this query guides where to focus our attention:

If these detectors are frequently triggered, it could result in service performance degradation and impact customer experience. The response to queries like this can also help us easily identify alerts that are triggered too frequently and are not actionable, which helps reduce alert noise.

Wrap up

To summarize, this second post in our Splunk Observability Cloud’s AI Assistant in Action series focused on using the AI Assistant to perform deploy and release checks, analyze trace errors, investigate database query performance, and interrogate active detectors and alerts. This enabled quick insight into root cause analysis, service performance, and deployment impacts for quick issue resolution.

Stay tuned for our next post, in which we’ll use the AI Assistant to help manage organizational compliance and infrastructure costs.

Want to try out the Splunk AI Assistant for yourself? Start with a 14-day free trial! Already a Splunk Observability Cloud customer? Reach out to your account representative to enable the Splunk AI Assistant!

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and Troubleshooting in Real-Time

Deploy and Release Checks

Analyze Trace Errors

Database Query Performance

Detector and Alert Insights

Wrap up

Resources

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...

Are you a member of the Splunk Community?

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and Troubleshooting in Real-Time

Deploy and Release Checks

Analyze Trace Errors

Database Query Performance

Detector and Alert Insights

Wrap up

Resources

Splunk Mobile: Your Brand-New Home Screen

Introducing Value Insights (Beta): Understand the Business Impact your organization ...

Enterprise Security (ES) Essentials 8.3 is Now GA — Smarter Detections, Faster ...