Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

The OpenTelemetry Certified Associate (OTCA) Exam

CaitlinHalla
Splunk Employee
Splunk Employee

What’s this OTCA exam?

The Linux Foundation offers the OpenTelemetry Certified Associate (OTCA) credential to confirm expertise in OpenTelemetry and observability practices. The exam might be of interest to you if you know and love observability best practices and are one of the following:

  • Software developer
  • DevOps and SRE practitioner
  • Cloud architect
  • Technical professional with a passion for OpenTelemetry

With the OTCA certification, your toolbox will be overflowing with an understanding of:

  • Core observability concepts and data types
  • OpenTelemetry API and SDK implementations
  • Deploying and managing OpenTelemetry Collector pipelines
  • Debugging and maintaining observability setups

Sound like something you're into? Let’s breakdown the certification structure and then get into how to prepare.

OTCA Domains & Competencies

According to The Linux Foundation’s OTCA certification page, an OpenTelemetry Certified Associate understands the basic concepts of observability, and how the OpenTelemetry project enables these concepts. The exam focuses on the major components of OpenTelemetry, how they are used, and the best practices for instrumenting cloud-native applications for observability with OpenTelemetry.

The exam is broken down into four main sections:

  1. Fundamentals of Observability (18%):
    1. Telemetry data
    2. Semantic conventions
    3. Instrumentation
    4. Analysis and outcomes
  2. The OpenTelemetry API and SDK (46%):
    1. Data model
    2. Composability and extension
    3. Configuration
    4. Signals (metrics, logs, traces)
    5. SDK pipelines
    6. Context propagation
    7. Agents
  3. The OpenTelemetry Collector (26%)
    1. Configuration
    2. Deployment
    3. Scaling
    4. Pipelines
    5. Transforming data
  4. Maintaining and Debugging Observability Pipelines (10%)
    1. Context propagation
    2. Debugging pipelines
    3. Error handling
    4. Schema management

The details

To pass the test, you’ll need a minimum score of 75%. There are 60 multiple choice questions, and you’ll have 90 minutes to complete the test. The exam is online and proctored, and you’ll need a clean, quiet workspace to be able to take the test. You’ll chat with your proctor to make sure your testing environment meets all the stated requirements.

You’ll be allowed two testing attempts, and both attempts are included in the $250 test cost.

Still sounds good? Let’s jump into what and how to study.

Study guide

Based on my research, my teammate Moss’ study guide, and with a splash of AI, here are some key components and areas to focus on when studying for the exam. This is by no means comprehensive, and I strongly encourage you to use this as a jumping off point in your own studying journey.

Domain 1: Fundamentals of Observability (18%)

  • Core Signals: Traces, Metrics, and Logs ("Events" are often part of logs/spans but are not a standalone core signal in the same way).
  • The "Why": OpenTelemetry is vendor-agnostic, allowing users to switch backends without re-instrumenting.
  • Semantic Conventions: A shared naming scheme (e.g., service.namehttp.response.status_codemessaging.system).
    • Schema URL: Included in telemetry to declare which version of semantic conventions is being followed.
    • If schemas don’t match:

Situation

Recommended Tool

Action

OTel Version Mismatch

schemaprocessor

Set a target version; the processor handles the rest.

Custom/Legacy Mapping

transform processor

Write OTTL statements to rename or move attributes.

Simple Renaming

attributes processor

Use the upsert or insert actions for basic key changes.

Data Cleaning

filter processor

Drop attributes or spans that don't fit the destination schema.

 

  • Observability Backends: Tools like Jaeger (traces), Zipkin (traces), Prometheus (metrics), Grafana Loki (logs) that receive and store OTel data.

Domain 2: The OpenTelemetry API and SDK (46%)

API vs. SDK

  • API: Defines how telemetry is generated. Use only the opentelemetry-api package when writing libraries to avoid dependencies.
  • SDK: The language-specific implementation of the API. It handles the processing and exporting. The SDK is separate from the API and can be swapped at runtime without changing instrumentation code.
  • In existing legacy code bases, you can use the and the SDK to manage and export the data.

Traces and Spans

  • Span: Represents a single operation. A SpanID identifies a unique span; a TraceID links spans across services. TraceIDs must be passed to services to create traces that can be viewed in observability tooling (see Context Propagation, below).
  • Span Hierarchy: Spans form a parent-child hierarchy. A Child Span is created when a service receives a request from an upstream traced service.
  • Span Kinds:
    • Producer: Sending a message to a broker.
    • Consumer: Processing a message from a broker.
  • Span Links: Used to associate spans that are causally related but not in a direct parent-child relationship (e.g., fan-out/batch processing).

Metrics Instruments

  • Counter: Monotonic (only goes up).
  • UpDownCounter: Can increase or decrease (e.g., queue length).
  • Gauge: Captures a non-additive value at a point in time (e.g., CPU usage).
  • Histogram: Records a distribution of values into "buckets" (e.g., request latency).
  • Meter: The API component responsible for creating these instruments. The MeterProvider is the factory for Meters. The Meter is the factory for Instruments (Counters, Histograms, etc.).
  • Views: Used in the SDK to rename, aggregate, or filter attributes of metrics before they are exported.

Context Propagation & Baggage

  • Context Propagation: Passing trace/context info across service boundaries (generic for all signals).
  • Default propagators:
    • W3C Trace Context: The default standard; uses the traceparent header.
    • Baggage: A key-value store that travels with the context across services. Unlike span attributes, baggage is propagated over the wire.
  • Propagators: Configured via OTEL_PROPAGATORS (e.g., tracecontext, baggage).

Resources & Instrumentation

  • Resources: Attributes describing the entity (e.g., KubernetesResourceDetector for pod info adds resources like container.id or k8s.pod.uid). If two detectors attempt to add the same resource, the last detector wins.
  • Zero-Code Instrumentation: Collecting data without modifying application code (e.g., Java Agent or OTel Operator).
  • Code-Based: Manual instrumentation using the API.

Domain 3: The OpenTelemetry Collector (26%)

The Pipeline

  • Components: A pipeline consists of ReceiversProcessors, and Exporters.
  • Connectors: A unique component that acts as an Exporter in one pipeline and a Receiver in another. Connectors are used to "bridge" different types of telemetry data or to route data based on specific logic. They are the primary way to turn one signal (like a Trace) into another (like a Metric) within the Collector.
  • Extensions: Components that live outside the data pipeline (e.g., health checks, zpages).

Key Processors

  • Batch Processor: Buffers data to reduce overhead. send_batch_size triggers a flush.
  • Memory Limiter: Prevents OOM crashes by dropping data. Look for otelcol_processor_refused_spans metrics.
  • Filter Processor: Used to drop specific data (e.g., only keep HTTP 500 errors).
  • Transform Processor: Used to sanitize or normalize attribute values.
  • K8s Attributes: Automatically adds pod/container metadata to all signals.

Exporters & Protocols

  • OTLP (OpenTelemetry Protocol): the “native language” of OTel. It is a general-purpose telemetry data delivery protocol to standardize how traces, metrics, and logs are encoded and transported between your applications, the OTel Collector, and observability backends
    • Encoding formats: Binary Protobuf, JSON
    • Supports gRPC (Port 4317) and HTTP (Port 4318)
    • Servers MUST support Gzip compression
    • Exporters MUST use exponential back-off with jitter for retries
    • Used for: standardization, vendor neutrality, efficiency (it uses Protocol Buffers for binary serialization), reliability (built-in features for backpressure signaling and retry logic)
  • Prometheus Remote Write: Used to push metrics directly to a Prometheus-compatible backend.

Deployment & Scaling

  • Agent: Collocated with the app (Sidecar/DaemonSet). Improves gRPC load balancing.
  • Gateway: Centralized instances for a cluster/region.
  • Kubernetes: Collector default Deployment

Mode

Kubernetes Resource

Best For...

deployment

apps/v1.Deployment

Centralized Gateways & stateless processing.

daemonset

apps/v1.DaemonSet

Collecting host metrics and logs from all nodes.

statefulset

apps/v1.StatefulSet

Tail-sampling and predictable scaling.

sidecar

Injected into Pod

Fargate, serverless, or per-app isolation.

 

  • Kubernetes-specific:
    • resourceDetection Processor: used to detect cluster-level or host-level information (k8s_node detector)
    • Kubeletstats Receiver: pulls node, pod, and container-level metrics (CPU, Memory, network) directly from the Kubelet API on each node
    • filelog Receiver: Collects logs from the standard /var/log/pods directory on the host
  • Scaling:
    • Routing key: you configure the exporter to hash based on a specific key
      • For tail sampling: you set routing_key: traceID
      • For span metrics, you might set routing_key: service
    • Scale up when the exporter queue reaches 60-70% capacity.
    • Do not scale if the backend is the bottleneck (indicated by otelcol_exporter_send_failed_spans).
    • Target Allocator: a specific component used when you want to scale Prometheus-style scraping. Automatically distributes scrape targets across multiple Collector replicas to prevent duplicate data
    • Loadbalancing Exporter: Uses DNS to distribute OTLP/gRPC traffic across backend collectors
    • L7 Load Balancer: Required for distributing gRPC traffic properly
    • The loadbalancing exporter is the “secret sauce” for scaling stateful collectors. It doesn’t use simple round-robin; instead, it uses consistent hashing.

Component

Role in Scaling

loadbalancing exporter

Uses consistent hashing to keep related spans together.

routing_key: traceID

The setting required to make Tail Sampling work at scale.

StatefulSet

The K8s workload used for the processing layer to provide stable identities.

Headless Service

Allows the load-balancing exporter to discover individual pod IPs via DNS.

Target Allocator

Distributes Prometheus scrape targets across multiple collectors.

Two-Layer Design

Layer 1 (Load Balancer) $\rightarrow$ Layer 2 (Stateful Processor).

 

Domain 4: Sampling and Maintenance (10%)

Sampling Types

  • Head Sampling: Decision (to sample or to drop) made at the start.
    • AlwaysOn: Keeps 100%.
    • TraceIDRatioBased: Keeps a specific percentage (e.g., 10%).
    • ParentBased: Respects the decision of the upstream parent.
  • Tail Sampling: Decision made after the trace is complete. This is stateful and requires careful scaling.

Critical Environment Variables

Variable

Purpose

OTEL_SERVICE_NAME

Sets the logical name of the service.

OTEL_EXPORTER_OTLP_ENDPOINT

Points the SDK/Agent to the Collector or SaaS backend.

OTEL_TRACES_SAMPLER

Changes the active sampler without recompiling.

OTEL_EXPORTER_OTLP_HEADERS

Attaches custom HTTP headers to exports.

OTEL_JAVAAGENT_ENABLED=false

Disables the Java agent at runtime.

OTEL_NODE_RESOURCE_DETECTORS

Used to disable specific detectors in Node.js.

Debugging & Testing

  • Self-Telemetry: Collector metrics are typically exposed (when enabled) on Port 8888.
    • If the Collector is dropping data...
      • Monitor internal metrics at Port 8888
      • Enable debug logging
      • Enable the zpages extension to get a live, web-based view of trace and metric buffers
      • The Collector’s default behavior depends on where the failure occurred:

Component

Scenario

Default Action

Memory Limiter

RAM usage hits the "Hard Limit"

Drop immediately. It refuses to accept any more data from receivers until memory is freed.

Exporter Queue

Queue is full (queue_size reached)

Drop immediately. The setting block_on_overflow defaults to false, so new data is rejected rather than making the app wait.

Exporter

Transient Error (e.g., 503, 429)

Retry. It uses exponential backoff (starting at 5s, up to 300s total) before finally dropping the data.

Exporter

Permanent Error (e.g., 400, 401)

Drop immediately. The Collector assumes the data is invalid and will never be accepted.

 

  • Unit Testing: Use the InMemorySpanExporter (in Python/other SDKs) to assert that spans are created correctly without a backend.

Read the Docs

At the end of the day, the only real way to study for the OTCA exam is to read the docs (that or use OpenTelemetry in a production environment every day and gain experience with every type of OTel scenario).

Hopefully, the above guide helps distill some of the details provided in the docs, but, again, it’s no substitute for digging into the docs yourself.

Need some docs links to get started? Here are some pages to focus on:

Best of luck on your exam!

Additional Resources

 

Never miss a new post. Check out this short guide on how to subscribe to the blog and get updates. 

Contributors
Get Updates on the Splunk Community!

The OpenTelemetry Certified Associate (OTCA) Exam

What’s this OTCA exam? The Linux Foundation offers the OpenTelemetry Certified Associate (OTCA) credential to ...

From Manual to Agentic: Level Up Your SOC at Cisco Live

Welcome to the Era of the Agentic SOC   Are you tired of being a manual alert responder? The security ...

Splunk Classroom Chronicles: Training Tales and Testimonials (Episode 4)

Welcome back to Splunk Classroom Chronicles, our ongoing series where we shine a light on what really happens ...