Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

How To Build a Self-Service Observability Practice with Splunk Observability Cloud

CaitlinHalla
Splunk Employee
Splunk Employee

If you’ve read our previous post on self-service observability, you already know what it is and why it matters. Self-service observability empowers teams to own their observability practice, eliminating cross-team dependencies and bottlenecks, and instead optimizes delivery and speeds up incident resolution. It also empowers your engineers to customize observability for their application’s unique needs.

In this post, we’ll walk through how to implement self-service observability using Splunk Observability Cloud

Start with Team Structure and Access Control

To prevent teams from fumbling through an incident while sifting through an explosion of dashboards and charts, team structure and organization within an observability platform are critical. A successful self-service observability practice is organized and starts with team structure. 

In Splunk Observability Cloud, Teams are a way to organize users into functional groups and efficiently connect them to the dashboards, charts, detectors, and alerts that matter to them. To create a Team: 

  • Select Settings in the left navigation menu
  • Go to Team Management 
  • Select Create team, enter a name and description, and then Add members
  • Save the Team by selecting Create team

CaitlinHalla_0-1752858685669.png

Team members will have access to a dedicated landing page that contains relevant information tailored to their needs. A landing page brings together team-specific:

  • Dashboard groups
  • Detectors triggered by team-linked alerts
  • Favorited and mirrored dashboards (centralized dashboards that are editable across teams but customizable per team without affecting other mirrors)

CaitlinHalla_1-1752858685677.png

This setup helps everyone stay aligned and focused, whether during incidents or proactive monitoring. 

For each team, team managers can be assigned to configure the proper roles and user management, including Admin, Power, Usage, and Read-only. This way, people only have access to the things they need. 

Access controls can be set using Splunk Observability Cloud Enterprise Edition’s built-in RBAC by:

  • Navigating to Settings and selecting Access Tokens
  • Creating team-specific access tokens with the appropriate permissions
  • Setting token usage limits to manage the maximum number of usage metrics

CaitlinHalla_2-1752858685674.png

This granular token provisioning protects your environment, enables usage tracking per team, and helps to proactively alert on usage thresholds so your teams can build and use the resources they need without unknowingly impacting cost. 

Standardized Templates & Observability as Code

Templating and automating the creation of observability resources increases the likelihood of adoption while also facilitating standardization through easily shareable resources. There are different ways to build templates, such as using the Splunk Observability Cloud API or the Splunk Terraform Provider. As long as your resources are in source control, there will be checks and balances, rollback capabilities, and higher developer velocity (because teams can focus on writing feature code rather than creating observability resources). 

Here’s an example of how to use the Splunk Terraform provider to create a dashboard and chart in Splunk Observability Cloud: 

CaitlinHalla_3-1752858685677.png

Teams can also be specified from within the Terraform definitions so that dashboards show up within the specified Team’s landing page. 

If you’d like to learn more about Observability as Code, check out our blog posts: Observability as Code and Let’s Talk Terraform

Implement OpenTelemetry

To enable self-service observability, you need to standardize the collection, processing, and export of telemetry data. OpenTelemetry provides this standardization through semantic conventions, including:

  • Tagging conventions: service.name, deployment.environment, team
  • Metric and span naming standards: http.server.duration
  • Attribute requirements: cloud.provider, region, k8s.cluster.name

When telemetry data is standardized, teams using observability resources can confidently find and understand their data, dashboards, and alerts. These resources are then reusable and familiar to users across teams, making it possible for incident responders to quickly jump in and provide assistance in emergencies. Using semantic conventions also helps your observability tooling aggregate across different cloud providers, frameworks, and programming languages to give a unified view of your business.

Measure, Iterate, Expand

Achieving Self-service observability is a journey. It is an incremental process that requires measuring adoption, identifying gaps, and iteration. Tracking adoption metrics, usage metrics, and cost can help you fine-tune your practice and deliver successful and complete visibility into your systems. We recommend you track some key metrics, below, to enable you to verify that teams are using observability, to understand and control costs, and to help justify your investment in observability.

Adoption metrics to track: 

  • Percentage of teams with at least one active detector
  • Number of dashboards or detectors created per team
  • Time from alert trigger to acknowledgement
  • Support ticket volumes (this should decrease as your observability practice grows)

Usage metrics to track: 

  • Metric cardinality
  • Time series volume
  • Whether or not specific metrics are being used

Cost optimizations:

  • Filter unused or overly granular metrics
  • Set rules for aggregation or dropping metrics
  • Route data to reduce ingestion and storage costs

Avoid Common Pitfalls

Finally, when implementing a self-service observability practice, it’s essential to avoid common problem areas that can lead to chaos, increased cost, and operational silos. Below are some key pitfalls to watch out for, along with strategies to avoid them. 

Don’t skip documentation

Every observability resource should have a clear description, contact information for the owning team, and links to runbooks where applicable. It might sound extreme, but a runbook should accompany every detector. Here’s a Terraform example of a detector with proper documentation: 

CaitlinHalla_4-1752858685675.png

Don’t ignore costs

With each team managing its observability, there is the potential for a telemetry data explosion or a rapid increase in the amount of telemetry data being exported to a backend observability platform. This explosion of data not only makes it difficult to process and analyze, but it also increases costs. 

Within Splunk Observability Cloud, you can always get detailed insight to monitor and manage subscription usage and billing to ensure you stay within your defined limits: 

CaitlinHalla_5-1752858685673.png

In the Enterprise Edition of Splunk Observability Cloud, Metrics Pipeline Management (MPM) centralizes a solution for managing metrics and helps you easily identify metric usage in-product. You can then adjust for metric explosion in-product or adjust the metric in code, and even adjust storage settings to lower costs and improve monitoring performance. 

Don’t create silos

Giving every team control over their observability practice can quickly dissolve into chaos if standards aren’t in alignment. Avoid silos and promote a unified strategy by: 

  • Using shared naming conventions
  • Providing shared templates
  • Promoting best practices through enablement (office hours, onboarding sessions, communication channels, internal demos, lunch-and-learns)

Always encourage cross-team collaboration and create opportunities for shared learning to ensure your self-service observability practice grows and thrives. 

Wrap Up

With an organized team structure, consistent telemetry powered by OpenTelemetry, and observability as code, your teams will not only be able to move faster, but they will also be empowered with the insights they need to respond to issues with greater efficiency and confidence. 

Ready to take the first step in your self-service observability journey? Start building your self-service observability practice today with Splunk Observability Cloud’s 14-day free trial

Resources

Get Updates on the Splunk Community!

Index This | What’s a riddle wrapped in an enigma?

September 2025 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with this ...

BORE at .conf25

Boss Of Regular Expression (BORE) was an interactive session run again this year at .conf25 by the brilliant ...

OpenTelemetry for Legacy Apps? Yes, You Can!

This article is a follow-up to my previous article posted on the OpenTelemetry Blog, "Your Critical Legacy App ...