We know three key players in observability are metrics, traces, and logs. Metrics help you detect problems within your system. Traces help you troubleshoot where the problems are occurring. Logs help you pinpoint root causes. These observability components, (along with others), work together to help you remediate issues quickly.
In our previous post, we discussed how Splunk Observability Cloud can help us detect and troubleshoot problems specifically in our Kubernetes environment. But how can we use our telemetry data to identify exactly what’s causing the problems in the first place? In this post, let’s dig into Splunk Log Observer Connect and see how we can diagnose and resolve issues fast.
Splunk Log Observer Connect is an integration that makes it possible to query log data from your existing Splunk Platform products (Enterprise or Cloud) and use the data alongside metrics and traces all from within Splunk Observability Cloud. If you’re a Splunk Enterprise or Splunk Cloud Platform customer, you can use Log Observer Connect to view in-context logs, run queries without SPL, and jump to Related Content with one easy click to quickly detect and resolve system problems.
You can get started with Log Observer Connect by following the setup steps or working with your Support team to add a new connection for Log Observer Connect in Splunk Observability Cloud.
Using the native OpenTelemetry logging capabilities deployed as part of the Helm chart included in the Splunk Distribution of the OpenTelemetry Collector is the recommended way to get logs from Kubernetes environments into Splunk. You can also configure logging during the initial integration process of the OTel Collector by specifying Log collection and providing your Splunk HEC endpoint and access token:
Interacting with logs in Splunk Observability Cloud often begins with an alert triggered by some error event like a problem with a Kubernetes cluster.
In Splunk Infrastructure Monitoring we can see in the Kubernetes Navigator, (which we toured in a previous post), that we have two such active alerts firing:
Opening them up, we can see critical alerts for memory usage. With a single click, we can Explore Further in Splunk Application Performance Monitoring by clicking on the Troubleshoot link:
This will take us to a Service Map view of our application where we can see something isn’t right:
Our paymentservice node is highlighted in red, meaning it’s the source of the root cause of our errors. If we select the red circle, we’ll see more info in the panel on the right and Infrastructure and Logs Related Content at the bottom of the screen. All of this information is specifically scoped to the selected paymentservice.
We can expand the Logs Related Content:
And then jump directly from there to Log Observer to view logs related to this service error:
With help from the logs, we can get to the bottom of what’s causing these errors. Let’s add some additional filters in the Content Control Bar to filter our logs by keywords or field values. Since we arrived via Related Content, we already have logs filtered to service.name = paymentservice. If we only wanted to see logs related to paymentservice errors, we could add another filter for severity = error:
If at any time we wanted to save a query to later validate a fix or share it with the rest of our team, we could add it to our Saved Queries. Select Save at the top right of the screen, then Save query to name and describe the saved query for later use:
Other users and/or your future self can use the Saved Query dropdown to later apply your saved query:
Moving over to the Fields panel on the right of the screen, we can view all available metadata present on entries in the Logs table. This is a great place to filter logs if you’re unsure of what fields you’re looking for. Here, we can see there’s a k8s.cluster.name field with the top values listed. In this case, we know which Kubernetes cluster we want to isolate, so we can include all logs for our specific cluster of interest:
We can then click on an individual log entry to see its details:
From the log details, we can see that the error message is “Failed payment processing through ButtercupPayments: Invalid API Token(test-20e26e90-356b-432e-a2c6-956fc03f5609).” Selecting the error message, we can filter further with a single click of Add to filter to ensure all logs are scoped to those with this error message:
We’ve also added the version field as a column by selecting the kebab menu next to the field followed by Add field as column:
Now we can easily scan the Logs table and identify which errors are associated with which version. At a glance, it appears that all the error logs are related to the same version number. Suspiciously, if we look for the version field in the Fields list, we can see that there is in fact only one version scoped to the current error logs:
Before we jump to resolutions, we can continue to interact with the log details and move through our system. We can explore the traces related to this error and see where in our code this error is being thrown. This could help us track down any recent code changes that may have potentially caused this error. We can simply click on the trace_id in the log detail and then View trace_id to jump to the trace back in Splunk APM:
If we open up one of the span errors and go into Tag Spotlight for the version trace property, we can confirm our suspicions – only our latest release is experiencing this “Invalid API token error”:
If we had first discovered the error while investigating this trace, we could have initially gotten to Log Observer via the Related Content from either the trace:
Or the Tag Spotlight:
We used Log Observer Connect to easily locate the cause of our errors. Thanks to the ability to move between Splunk Infrastructure Monitoring, APM, and Log Observer, we were able to confidently move forward with a fix 🎉.
If you want to connect your Splunk Enterprise or Splunk Cloud Platform logs to Splunk Observability Cloud using Splunk Log Observer Connect, again, check out the Introduction to Splunk Log Observer Connect.
New to Splunk and want to get started with Splunk Observability Cloud? Start a 14 day free trial!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.