Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

atoulme · ‎09-29-2022

This blog post is part of an ongoing series on OpenTelemetry.

The OpenTelemetry project is the second largest project of the Cloud Native Computing Foundation (CNCF).The CNCF is a member of the Linux Foundation and besides OpenTelemetry, also hosts Kubernetes, Jaeger, Prometheus, or Helm among others.

OpenTelemetry defines a model to represent traces, metrics, and logs. Using this model, it orchestrates libraries in different programming languages to allow folks to collect this data. Just as important, the project delivers an executable named the OpenTelemetry Collector, which receives, processes and exports data as a pipeline.

The OpenTelemetry Collector uses a component-based architecture, which allows folks to devise their own distribution by picking and choosing which components they want to support. At Splunk, we manage the distribution of our version of the OpenTelemetry collector under this open source repository. The repository contains our configuration and hardening parameters as well as examples.

This blog post will walk you through using OpenTelemetry processors to route log data to different indexes depending on its content.

WARNING: WE ARE DISCUSSING A CURRENTLY UNSUPPORTED CONFIGURATION. When sending data to Splunk Enterprise, we currently only support use of the OpenTelemetry Collector in Kubernetes environments. As always, use of the Collector is fully supported to send data to Splunk Observability Cloud.

This blog post is considered an advanced topic. If you are just catching up with the series, please go back and read on how to set up log ingestion. In this example, we will take a simple log ingestion setup and apply new techniques to change how to route data.

Previously, we discussed the concept of pipeline in OpenTelemetry. Pipelines contain 3 element types: receivers, processors, and exporters. We are going to spend a bit more time with processors in this post, but first, we are going to revisit a filelog receiver that is doing more than in our previous post:

receivers:
    filelog:
      include: [ /output/file*.log ]
      operators:
        - type: regex_parser
          regex: '(?P<logindex>log\d?)'

What this processor does is reading each line in any of the log files in the output folder starting with file and the log file extension. For each of the lines, it applies a regular expression that reads a portion of the line, extracting log\d? into the attribute logindex.

Log records have a body, which is what contains the log message. They can also hold attributes. So each of our records now have an attribute based on the content of the line.

Now, we use processors to apply different behaviors based on the value of the logindex attribute:

processors:
    batch:
    attributes/log:
      include:
        match_type: strict
        attributes:
          - { key: logindex, value: 'log' }
      actions:
        - key: com.splunk.index
          action: upsert
          value: "logs"
        - key: logindex
          action: delete

When the value of logindex attribute is ‘log’, do the following:

Set the attribute com.splunk.index to the value logs
Delete the logindex attribute.

The com.splunk.index attribute has significance here as it is used to override the default index configuration value set on the Splunk HEC exporter. You can see all the configuration fields in the README of the Splunk exporter.

We are going to create data that will satisfy this processor and route logs to the “logs” index. For fun, we are going to also use the “log2” and “log3” values to route to the “logs2” and “logs3” indexes respectively.

Let’s use bash to generate some content as containers in a Docker Compose:

logging:
    image: bash:latest
    container_name: logging
    command: 'bash -c "while(true) do echo \"$$(date) log new message\" >> /output/file.log ; sleep 1; done"'
    volumes:
      - ./output:/output
  logging2:
    image: bash:latest
    container_name: logging2
    command: 'bash -c "while(true) do echo \"$$(date) log2 new message\" >> /output/file2.log ; sleep 1; done"'
    volumes:
      - ./output:/output
  logging3:
    image: bash:latest
    container_name: logging3
    command: 'bash -c "while(true) do echo \"$$(date) log3 new message\" >> /output/file3.log ; sleep 1; done"'
    volumes:
      - ./output:/output

Our Docker Compose also runs the collector and Splunk Enterprise, similarly to our log ingestion example.

We have put this all together into an example that lives under Splunk’s OpenTelemetry Collector github repository. To run this example, you will need at least 4 gigabytes of RAM, as well as git and Docker Desktop installed.

First, check out the repository using git clone:

git clone https://github.com/signalfx/splunk-otel-collector.git

Using a terminal window, navigate to the folder examples/otel-logs-processor-splunk.

Type:

docker-compose up

This will start the OpenTelemetry Collector, our program generating traces, and Splunk Enterprise.

Your terminal will display information as Splunk starts. Eventually, Splunk will display the same information as in our last blog post to let us know it is ready.

Now, you can open your web browser and navigate to http://localhost:18000. You can log in as admin/changeme.

You will be met with a few prompts as this is a new Splunk instance. Make sure to read and acknowledge them, and open the default search application.

Now, we can interrogate our logs index. Enter the search: index=logs to see the logs sent to Splunk:

You can change the search to see the contents of the index logs2:

When you have finished exploring this example, you can press Ctrl+C to exit from Docker Compose. Thank you for following along! This concludes our first look at using processors to manipulate log records. We have used the OpenTelemetry Collector to successfully route data to different indexes of our Splunk Enterprise instance.

If you found this example interesting, feel free to star the repository! Just click the star icon in the top right corner. Any ideas or comments? Please open an issue on the repository.

— Antoine Toulme, Senior Engineering Manager, Blockchain & DLT

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

Dashboards: Hiding charts while search is being executed and other uses for tokens

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

Brains, Bytes, and Boston: Learn from the Best at .conf25

Are you a member of the Splunk Community?

Routing Data to Different Splunk Indexes in the OpenTelemetry Collector

Dashboards: Hiding charts while search is being executed and other uses for tokens

Splunk Observability Cloud's AI Assistant in Action Series: Explaining Metrics and ...

Brains, Bytes, and Boston: Learn from the Best at .conf25