Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

Splunk Observability Cloud’s AI Assistant in Action Series: Analyzing and Troubleshooting in Real-Time

CaitlinHalla
Splunk Employee
Splunk Employee

This is the second post in our Splunk Observability Cloud’s AI Assistant in Action series, in which we look at how to use the Splunk AI Assistant by digging into some practical, real-world, real-time examples. This series explores the specific use cases of:

  1. Identifying unknown unknowns
  2. Analyzing and troubleshooting in real-time 
  3. Auditing compliance and cost 
  4. Explaining metrics or providing feedback
  5. Onboarding new hires or new users of Splunk Observability Cloud
  6. Observability as Code

In our first post in this series, we saw how to access the Splunk AI Assistant and then explored how we could identify unknown unknowns within our service environment. The AI Assistant identified that our latest release could be causing high error rates and high latency. Before jumping into this second post, check out that first post: Identifying Unknown Unknowns

This second post picks up where we left off and jumps into using the Splunk AI Assistant to:

  • Perform deploy and release checks
  • Investigate database query performance
  • Interrogate active detectors and alerts

Deploy and Release Checks

We can use the AI Assistant to compare new code releases to previous versions. Let’s see this in action by asking the AI Assistant to compare the current latency of our Payment service to the prior version. We’ll ask it to compare Payment service performance for the current and previous versions in our Online Boutique environment and identify changes in behavior or error patterns for the latest release:

CaitlinHalla_0-1748365086656.png

The AI Assistant returns a comparison that shows the latest version of our service, version 350.10, has a much higher latency than the previous version, version 350.9

CaitlinHalla_1-1748365086632.png

Scrolling down to the error rate analysis, we can see version 350.10 has an error rate of over 30,000 errors, while version 350.9 has 0 errors. The analysis also shows that the common error for the latest version is “Invalid request with HTTP status code 401”:

CaitlinHalla_2-1748365086638.png

The AI Assistant also highlights observations and recommendations for further investigation and resolution of the issue: 

CaitlinHalla_3-1748365086665.png

The Observations section shows that the latency increase and error rate surge are significant and indicate “potential issues with the latest release.”

The Recommendations include ideas for further investigation, analysis of code changes between versions, and rolling back to version 350.9 to maintain service stability.

Analyze Trace Errors

While investigating real-time issues, looking at specific trace data is helpful to get a deeper understanding of the root cause. We can use the AI Assistant to analyze trace errors for this troubleshooting in real-time use case. Here, we ask the AI Assistant to analyze trace errors specifically for our Payment service in our Online Boutique environment over the last 15 minutes:

CaitlinHalla_4-1748365086658.png

It’s important to note that being more specific with your AI Assistant queries – for example, specifying services, environments, time windows, and data types will help the assistant return more specific and useful responses. 

In the response for our current prompt, we can see a list of trace errors for the Payment service with dynamic links to each of the traces:

CaitlinHalla_5-1748365086639.png

We can then ask for a deeper analysis of traces by prompting the AI Assistant with a specific trace ID: 

 

CaitlinHalla_6-1748365086660.png

The AI Assistant identifies the root cause of the errors as a 401 status code or unauthorized access to one of our third-party APIs: the ButtercupPayments service. The response includes the error message and error tags for the invalid request, and also an analysis of trace performance: 

CaitlinHalla_7-1748365086662.png

Scrolling down in the response, we can see the Performance Issues section that provides helpful information about the duration of the spans within our specific trace. The duration for this trace was relatively short, which hints at a pretty immediate rejection of the request. This section also highlights the service interaction, which explains how the data is requested. In this case, our Payment service is trying to request the Buttercup Payments API and failing due to this authorization issue:

CaitlinHalla_8-1748365086664.png

Looking at the recommendations, the AI Assistant suggests reviewing the authentication mechanisms for the Buttercup Payments API to ensure the credentials or tokens are valid, along with a couple of other recommendations: 

CaitlinHalla_9-1748365086660.png

Selecting the hyperlink at the end of this response will take us to the trace in Splunk APM so we can look at the data in detail and explore further:

CaitlinHalla_10-1748365086595.png

Database Query Performance

Not only can we ask the AI Assistant to analyze traces and metrics related to our services, but we can also ask it to analyze database performance and investigate database queries. We can ask detailed questions like, “What are the top 5 worst performing database queries in the online boutique environment?”

In the response to this question, we get a list of the database queries that have high latency, along with the number of times those queries were executed: 

CaitlinHalla_11-1748365086660.png

Detector and Alert Insights

The Splunk AI Assistant can also provide insight into detectors and alerts. If we navigate to Detectors & SLOs in Splunk Observability Cloud and select critical alerts, we can see we have an active high error rate alert for our Payment service: 

CaitlinHalla_12-1748365086647.png

If we select that alert and scroll down in the alert window, we can get the Incident ID of this specific alert: 

CaitlinHalla_13-1748365086627.png

We can copy that Incident ID and then use that ID to ask the AI Assistant to explain this specific alert and tell us why it was triggered. In the response, the AI Assistant provides a summary of the alert details along with an explanation of the triggering conditions and why they were triggered: 

CaitlinHalla_14-1748365086621.png

We started investigating this specific alert by navigating to the Detectors & SLOs page view in Splunk Observability Cloud, but we could have also queried the AI Assistant to get information about active or frequently triggered detectors. The response to this query guides where to focus our attention:

CaitlinHalla_15-1748365086657.png

If these detectors are frequently triggered, it could result in service performance degradation and impact customer experience. The response to queries like this can also help us easily identify alerts that are triggered too frequently and are not actionable, which helps reduce alert noise. 

Wrap up

To summarize, this second post in our Splunk Observability Cloud’s AI Assistant in Action series focused on using the AI Assistant to perform deploy and release checks, analyze trace errors, investigate database query performance, and interrogate active detectors and alerts. This enabled quick insight into root cause analysis, service performance, and deployment impacts for quick issue resolution. 

Stay tuned for our next post, in which we’ll use the AI Assistant to help manage organizational compliance and infrastructure costs.

Want to try out the Splunk AI Assistant for yourself? Start with a 14-day free trial! Already a Splunk Observability Cloud customer? Reach out to your account representative to enable the Splunk AI Assistant!  

Resources 

Get Updates on the Splunk Community!

Dashboards: Hiding charts while search is being executed and other uses for tokens

There are a couple of features of SimpleXML / Classic dashboards that can be used to enhance the user ...

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

This is the fourth post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

When you think of Boston, you might picture colonial charm, world-class universities, or even the crack of a ...