If you’ve read our previous post on self-service observability, you already know what it is and why it matters. Self-service observability empowers teams to own their observability practice, eliminating cross-team dependencies and bottlenecks, and instead optimizes delivery and speeds up incident resolution. It also empowers your engineers to customize observability for their application’s unique needs.
In this post, we’ll walk through how to implement self-service observability using Splunk Observability Cloud.
To prevent teams from fumbling through an incident while sifting through an explosion of dashboards and charts, team structure and organization within an observability platform are critical. A successful self-service observability practice is organized and starts with team structure.
In Splunk Observability Cloud, Teams are a way to organize users into functional groups and efficiently connect them to the dashboards, charts, detectors, and alerts that matter to them. To create a Team:
Team members will have access to a dedicated landing page that contains relevant information tailored to their needs. A landing page brings together team-specific:
This setup helps everyone stay aligned and focused, whether during incidents or proactive monitoring.
For each team, team managers can be assigned to configure the proper roles and user management, including Admin, Power, Usage, and Read-only. This way, people only have access to the things they need.
Access controls can be set using Splunk Observability Cloud Enterprise Edition’s built-in RBAC by:
This granular token provisioning protects your environment, enables usage tracking per team, and helps to proactively alert on usage thresholds so your teams can build and use the resources they need without unknowingly impacting cost.
Templating and automating the creation of observability resources increases the likelihood of adoption while also facilitating standardization through easily shareable resources. There are different ways to build templates, such as using the Splunk Observability Cloud API or the Splunk Terraform Provider. As long as your resources are in source control, there will be checks and balances, rollback capabilities, and higher developer velocity (because teams can focus on writing feature code rather than creating observability resources).
Here’s an example of how to use the Splunk Terraform provider to create a dashboard and chart in Splunk Observability Cloud:
Teams can also be specified from within the Terraform definitions so that dashboards show up within the specified Team’s landing page.
If you’d like to learn more about Observability as Code, check out our blog posts: Observability as Code and Let’s Talk Terraform.
To enable self-service observability, you need to standardize the collection, processing, and export of telemetry data. OpenTelemetry provides this standardization through semantic conventions, including:
When telemetry data is standardized, teams using observability resources can confidently find and understand their data, dashboards, and alerts. These resources are then reusable and familiar to users across teams, making it possible for incident responders to quickly jump in and provide assistance in emergencies. Using semantic conventions also helps your observability tooling aggregate across different cloud providers, frameworks, and programming languages to give a unified view of your business.
Achieving Self-service observability is a journey. It is an incremental process that requires measuring adoption, identifying gaps, and iteration. Tracking adoption metrics, usage metrics, and cost can help you fine-tune your practice and deliver successful and complete visibility into your systems. We recommend you track some key metrics, below, to enable you to verify that teams are using observability, to understand and control costs, and to help justify your investment in observability.
Adoption metrics to track:
Usage metrics to track:
Cost optimizations:
Finally, when implementing a self-service observability practice, it’s essential to avoid common problem areas that can lead to chaos, increased cost, and operational silos. Below are some key pitfalls to watch out for, along with strategies to avoid them.
Every observability resource should have a clear description, contact information for the owning team, and links to runbooks where applicable. It might sound extreme, but a runbook should accompany every detector. Here’s a Terraform example of a detector with proper documentation:
With each team managing its observability, there is the potential for a telemetry data explosion or a rapid increase in the amount of telemetry data being exported to a backend observability platform. This explosion of data not only makes it difficult to process and analyze, but it also increases costs.
Within Splunk Observability Cloud, you can always get detailed insight to monitor and manage subscription usage and billing to ensure you stay within your defined limits:
In the Enterprise Edition of Splunk Observability Cloud, Metrics Pipeline Management (MPM) centralizes a solution for managing metrics and helps you easily identify metric usage in-product. You can then adjust for metric explosion in-product or adjust the metric in code, and even adjust storage settings to lower costs and improve monitoring performance.
Giving every team control over their observability practice can quickly dissolve into chaos if standards aren’t in alignment. Avoid silos and promote a unified strategy by:
Always encourage cross-team collaboration and create opportunities for shared learning to ensure your self-service observability practice grows and thrives.
With an organized team structure, consistent telemetry powered by OpenTelemetry, and observability as code, your teams will not only be able to move faster, but they will also be empowered with the insights they need to respond to issues with greater efficiency and confidence.
Ready to take the first step in your self-service observability journey? Start building your self-service observability practice today with Splunk Observability Cloud’s 14-day free trial.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.