3 Things I Love About OpenTelemetry

dmitch · ‎08-23-2024

In 2022, I made the decision to focus my career on OpenTelemetry. I was excited by the technology and, after working with proprietary APM agent technology for nearly a decade, I believed that it was the future of instrumentation.

This ultimately led me to join Splunk in 2023 as an Observability Specialist. Splunk Observability Cloud is OpenTelemetry-native, so this role allowed me to work extensively with OpenTelemetry as customers of all sizes implemented it in their organizations.

So how am I feeling about OpenTelemetry in 2024? Well, I’m even more excited about it than before! In this article, I’ll share the top three things that I love about OpenTelemetry.

#1: It’s Easy to Use!

In the beginning, instrumenting an application with OpenTelemetry required code changes. This is referred to as code-based instrumentation or manual instrumentation.

It was great for early enthusiasts who were passionate about observability, wanted full control over their telemetry, and didn’t mind spending time instrumenting their code by hand.

OpenTelemetry has come a long way since then, with some form of auto-instrumentation support for the most popular languages such as Java, .NET, Node.js, and Python. This is also referred to as zero-code solutions.

This makes it easier for organizations to get up and running quickly with OpenTelemetry, just like they would with a proprietary solution based on traditional APM-agents, while also giving them the flexibility to layer on custom instrumentation if desired.

The advent of the OpenTelemetry Operator for Kubernetes has also made it easier to instrument applications running in Kubernetes. Specifically, the OpenTelemetry Operator can automatically inject and configure instrumentation libraries for multiple languages. This makes it simple for organizations using Kubernetes to instrument their applications using OpenTelemetry.

Ultimately, these ease of use improvements have made OpenTelemetry more accessible, and have dramatically reduced the time to value, as it’s now possible to be up and running with OpenTelemetry in just minutes.

#2: Flexibility of the OpenTelemetry Collector Architecture

The collector is my favorite part of OpenTelemetry (and perhaps the most flexible yet elegant architecture I’ve encountered in my career).

While the concept of the collector originated in 2017 as part of the OpenCensus project at Google, with OpenTelemetry it has evolved to become a mature and highly flexible software component that many organizations depend on.

The OpenTelemetry Collector allows one or more pipelines to be configured, which define how data is received, processed, and exported. This data can include metrics, traces, and logs. A pipeline can be depicted as follows:

Source: https://opentelemetry.io/docs/collector/architecture/

Each pipeline can have one or more Receivers that receive metric, trace, or log data from various sources. The data in the pipeline then passes through one or more Processors, which can transform, filter, or manipulate the data in other desired ways. Finally, the data is then exported using Exporters to one or more observability backends, which allows organizations to decide exactly where they want to send their observability data.

This architecture provides near infinite flexibility. For example, if you want to send your metrics to one observability backend and your traces and logs to another, no problem! Or if you want to send a subset of traces to a backend that resides in a particular jurisdiction, to comply with data residency requirements, sure!

Here are a few additional examples of what you can do with the collector:

Use the Resource Detection Processor to gather additional information about the host it’s running on, and add it as context to metrics, spans, and logs.
Use the Redaction Processor to redact sensitive data before it leaves your network.
Use the Transform Processor to rename span attributes, to ensure naming conventions are enforced across all of your observability data.

The architecture also allows collectors to be chained. Typically, this means running a collector in agent mode on each host or Kubernetes node to gather data from applications running on that host along with infrastructure-related data. These collectors will then export their data to another collector running in gateway mode, which will perform additional processing on the data before it’s exported to one or more observability backends. This collector architecture is depicted in the following diagram:

Source: https://opentelemetry.io/docs/collector/architecture/

#3: Support for Logs

While metrics and traces have been Generally Available (GA) in OpenTelemetry for several years, it wasn’t until November 2023 that logs joined these other signals and became GA as well.

This was a tremendous step forward, as logs play a critical role in the troubleshooting process, frequently providing the details that engineers need to understand why a particular issue is occurring.

I love that OpenTelemetry provides so many different ways to ingest logs, including support for Fluent Bit and Fluentd with the Fluent Forward Receiver. Or the versatile Filelog Receiver, which can be configured to ingest logs from just about any file-based source.

And it gets even better with Kubernetes, which now includes a Logs Collection Preset. This preset, which is available in the collector Helm chart, uses the Filelog Receiver under the hood, and provides all of the configuration needed to automatically collect logs using the standard output from Kubernetes containers.

Collecting logs with OpenTelemetry means that we can apply all of the power and flexibility of the collector that was discussed in the previous section to logs. And the metrics, traces, and logs that are collected with OpenTelemetry share the same Semantic Conventions, which makes it possible to correlate these different signals together. For example, with Semantic Conventions, log events that include the TraceId, SpanId, and TraceFlags fields can now be linked to the corresponding trace data. This makes it easy to jump between related logs, traces, and metric data when troubleshooting an issue, where time is of the essence.

I also love that some languages such as Java have started to collect logs automatically with OpenTelemetry. So there’s no need to even use the Filelog Receiver to ingest the application logs, as everything is captured using OpenTelemetry SDKs under the hood. In addition to requiring less configuration effort, collecting logs in this manner is also more performant, as there’s no need to read application log files from the host filesystem and parse them.

Summary

Thanks for taking the time to hear my thoughts on OpenTelemetry. Please leave a comment or reach out to let us know what you love about OpenTelemetry.

3 Things I Love About OpenTelemetry

#1: It’s Easy to Use!

#2: Flexibility of the OpenTelemetry Collector Architecture

#3: Support for Logs

Summary

Announcing the Expansion of the Splunk Academic Alliance Program

Learn Splunk Insider Insights, Do More With Gen AI, & Find 20+ New Use Cases You Can ...

Buttercup Games: Further Dashboarding Techniques (Part 7)