Community Blog
Get the latest updates on the Splunk Community, including member experiences, product education, events, and more!

Guide: Isolated OpenTelemetry Tracing for Multiple WARs in WildFly

ErwinAtSplunk
Splunk Employee
Splunk Employee

Isolating Telemetry Boundaries: How to Trace Multiple WARs as Separate Services in One WildFly JVM

 

Executive Summary

 

Key Takeaways

  • The Core Problem: JVM-level agents view application servers as a single service boundary, blending spans from separate applications together.

  • The Architecture Shift: Removing the -javaagent flag allows WildFly's native OpenTelemetry and MicroProfile subsystems to spawn isolated SDK instances per WAR.

  • The Configuration: Define custom service.name values inside each deployment's microprofile-config.properties and scrape JVM metrics directly via the Prometheus management endpoint.

 

 

Deploy app1.war and app2.war into the same WildFly JVM, attach the default OpenTelemetry Java agent, and both applications show up as one service.

The Java agent starts once at JVM level. So it gets one OpenTelemetry configuration, one resource, and one service.name.

That works well for many modern Java services. But it is the wrong boundary for an application server that hosts multiple applications.

Running multiple applications in one Java application server is still common. It is efficient, familiar, proven technology and often exactly how the platform is designed.

But if app1.war and app2.war both report the same service.name, your observability backend cannot tell which application produced which span.

That is not what we want.

The good news: monitoring multiple applications in one application server as separate services is possible! The default OpenTelemetry Java agent path does not support it, but with a few targeted WildFly and application changes it will work. We will set up each deployment with its own service.name, keep Server-Timing for RUM to APM correlation, and keep JVM metrics flowing through the OpenTelemetry Collector.

If you want to run it yourself, the GitHub repository contains the scripts, configuration, and a more technical guide.

Architectural Problem: The JVM Boundary Mismatch

The OpenTelemetry Java agent is attached to the JVM with -javaagent. For many modern Java services, that is a great default. It instruments the process early, discovers libraries, exports traces, and can collect JVM-level telemetry.

When Splunk Observability is the backend, the Splunk Java agent is the logical starting point. It is based on the OpenTelemetry Java agent and adds useful Splunk-specific convenience, like automatically adding Server-Timing headers for RUM to APM correlation.

But it also means the agent sees the world from the JVM boundary. And that is where the mismatch starts.

In a WildFly setup with multiple WAR files, that boundary is too wide. The JVM contains multiple applications, but the agent owns one OpenTelemetry SDK and one set of resource attributes.

So this kind of setup:

WildFly JVM
+-- app1.war
`-- app2.war

turns into this kind of observability model:

service.name = wildfly
+-- spans from app1
`-- spans from app2

That makes troubleshooting harder than it needs to be. In Splunk Observability Cloud, service identity drives filters, service maps, dashboards, alerts, and ownership. If every deployment has the same service.name, you cannot quickly tell which application produced which span.

And this is where most guides stop. They assume one Java process is one service. That assumption does not hold here.

Core Limitations: Why a Java Agent Cannot Solve This

OpenTelemetry-based Java agents operate at JVM level. The agent starts once and applies one configuration to every deployed application.

That is the core limitation. Not a bug. Just the wrong boundary for this setup.

There is no setting that says:

app1.war -> service.name=app1
app2.war -> service.name=app2

The agent can set otel.service.name, but that value belongs to the process. Not to each deployment inside the process.

So we remove the agent from this setup.

That sounds bigger than it is. We do not remove observability. We replace the agent responsibilities with pieces that understand the WildFly deployment model.

Technical Solution: WildFly's MicroProfile Telemetry + OpenTelemetry Subsystems

WildFly ships two cooperating subsystems that together solve this exact problem:

  • opentelemetry — provides the OTel API/SDK runtime classes and per-deployment SDK lifecycle.

  • microprofile-telemetry — bridges each WAR's MicroProfile Config (otel.* properties) into its own SDK instance.

Together, they create an isolated OpenTelemetry SDK instance per deployment, and each WAR reads its own configuration through MicroProfile Config.

That gives us the model we need:

WildFly JVM
+-- app1.war -> service.name=app1
`-- app2.war -> service.name=app2

The reference setup in this post uses WildFly. Yet the same approach also applies to JBoss EAP 8.

Profile Selection: Pick the Right WildFly Profile

The two subsystems we need are pre-enabled in the shipped standalone-microprofile.xml profile. So instead of editing config files, we start WildFly with the -c flag:

./bin/standalone.sh -c standalone-microprofile.xml

The relevant pre-enabled pieces look like this inside that profile:

<extension module="org.wildfly.extension.opentelemetry"/>
<extension module="org.wildfly.extension.microprofile.telemetry"/>
...
<subsystem xmlns="urn:wildfly:opentelemetry:1.1"/>
<subsystem xmlns="urn:wildfly:microprofile-telemetry:1.0"/>

The <opentelemetry> element is intentionally empty. With nothing configured at the server level — no exporter, no sampler, no service name — the subsystem provides only the runtime classes. All SDK configuration is read per-deployment from each WAR's MicroProfile Config in the next step. That is what gives every WAR its own service.name.

The default standalone.xml profile does not include the microprofile-telemetry subsystem, which is the one that bridges MicroProfile Config into the OTel SDK. Without it, per-deployment otel.service.name values are ignored and traces fall back to using the WAR filename (e.g. app1.war) as the service name. That is why we pick the MicroProfile profile.

Agent Removal: Disabling the JVM-Level Java Agent

Next we remove the Java agent from standalone.conf.

That means removing settings like these:

JAVA_OPTS="$JAVA_OPTS -javaagent:/path/to/splunk-otel-javaagent.jar"
JAVA_OPTS="$JAVA_OPTS -Dotel.service.name=wildfly"

No replacement JVM options are needed for traces.

A Java agent and WildFly's per-deployment OTel SDK setup cannot both own OpenTelemetry in the same JVM. The agent claims GlobalOpenTelemetry at JVM startup. Even with otel.sdk.disabled=true, it can still register a disabled SDK globally.

WildFly then sees that global SDK and does not initialize its own per-deployment SDK instances.

So the agent has to go. Otherwise WildFly never gets the chance to create the per-deployment SDK instances we need.

Target Configurations: Setting Up Each WAR Deployment

Now let's give each WAR its own service name.

Create this file inside each deployment:

src/main/resources/META-INF/microprofile-config.properties

For app1:

otel.sdk.disabled=false
otel.service.name=app1
otel.exporter.otlp.endpoint=http://localhost:4317

For app2:

otel.sdk.disabled=false
otel.service.name=app2
otel.exporter.otlp.endpoint=http://localhost:4317

These three small properties create a big effect:

  • otel.sdk.disabled=false opts this deployment in to its own OTel SDK instance.

  • otel.service.name is the per-deployment identity we need.

  • otel.exporter.otlp.endpoint points at the OpenTelemetry Collector.

MicroProfile Config is the standard configuration mechanism used by the deployment. WildFly reads these otel.*properties per WAR, so each application gets its own resource attributes and its own exporter wiring.

You can add more OpenTelemetry resource attributes in the same file. At minimum, add a version or build identifier and a deployment environment:

otel.resource.attributes=service.version=1.2.0,deployment.environment=production

That gives every span more useful context. service.name tells us which application emitted the span. service.version and deployment.environment tell us which version and environment produced it.

Discovery Activation: Enabling CDI Beans within the WAR

There is one more application-side step.

WildFly's OpenTelemetry integration registers its JAX-RS tracing filter as a CDI bean. CDI, Contexts and Dependency Injection, manages bean discovery and lifecycle inside the WAR.

So each deployment needs WEB-INF/beans.xml:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="https://jakarta.ee/xml/ns/jakartaee"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation="https://jakarta.ee/xml/ns/jakartaee https://jakarta.ee/xml/ns/jakartaee/beans_4_0.xsd"
       version="4.0"
       bean-discovery-mode="all">
</beans>

Without this file, the tracing filter is not discovered. And then the deployment stays silent.

Frontend Correlation: Adding Server-Timing Headers Again

One thing I do like about the Splunk Java agent is that it automatically adds Server-Timing response headers. Those headers expose the trace context to the browser and frontend tools, enabling RUM to APM correlation.

When we remove the agent, we lose that automatic behavior. So we add it back.

The fix is a small JAX-RS filter in each WAR.

First add the OpenTelemetry API dependency. WildFly provides it at runtime, so the Maven scope is provided:

<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-api</artifactId>
    <version>1.40.0</version>
    <scope>provided</scope>
</dependency>

Then add the filter:

package com.example.app1;

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.SpanContext;
import jakarta.annotation.Priority;
import jakarta.ws.rs.Priorities;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.container.ContainerRequestFilter;
import jakarta.ws.rs.container.ContainerResponseContext;
import jakarta.ws.rs.container.ContainerResponseFilter;
import jakarta.ws.rs.ext.Provider;

@Provider
@Priority(Priorities.USER)
public class ServerTimingFilter implements ContainerRequestFilter, ContainerResponseFilter {

    private static final String SPAN_CONTEXT_PROP = "otel.span.context";

    @Override
    public void filter(ContainerRequestContext requestContext) {
        SpanContext spanContext = Span.current().getSpanContext();
        if (spanContext.isValid()) {
            requestContext.setProperty(SPAN_CONTEXT_PROP, spanContext);
        }
    }

    @Override
    public void filter(ContainerRequestContext requestContext, ContainerResponseContext responseContext) {
        SpanContext spanContext = (SpanContext) requestContext.getProperty(SPAN_CONTEXT_PROP);
        if (spanContext != null) {
            String traceparent = "00-" + spanContext.getTraceId() + "-" + spanContext.getSpanId() + "-01";
            responseContext.getHeaders().add("Server-Timing", "traceparent;desc=\"" + traceparent + "\"");
            responseContext.getHeaders().add("Access-Control-Expose-Headers", "Server-Timing");
        }
    }
}

This filter implements both ContainerRequestFilter and ContainerResponseFilter.

During the request phase, the active span is still available. We capture its SpanContext and store it on the request. During the response phase, we read that stored context and write the Server-Timing header.

The response now includes:

Server-Timing: traceparent;desc="00-<traceId>-<spanId>-01"
Access-Control-Expose-Headers: Server-Timing

That restores the frontend correlation behavior we lost by removing the agent.

Deep Visibility: Manual Instrumentation and Custom Child Spans

Automatic JAX-RS tracing is useful, but application-level spans make traces much more readable.

WildFly's OpenTelemetry subsystem also supports annotation-based manual instrumentation with @WithSpan and @SpanAttribute.

For those annotations, add the instrumentation annotation dependency to the WAR that uses them:

<dependency>
    <groupId>io.opentelemetry.instrumentation</groupId>
    <artifactId>opentelemetry-instrumentation-annotations</artifactId>
    <version>2.9.0</version>
    <scope>provided</scope>
</dependency>

In the sample application, app2 has a small calculation service:

@ApplicationScoped
public class CalculationService {

    @WithSpan("simulate-calculation")
    public long calculate(@SpanAttribute("calculation.sleep_ms") long sleepMs) {
        try {
            Thread.sleep(sleepMs);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
        return sleepMs;
    }
}

The annotation creates a child span called simulate-calculation. The @SpanAttribute annotation records the sleep duration ascalculation.sleep_ms.

That method is called from the app2 JAX-RS resource:

@GET
@Produces(MediaType.TEXT_PLAIN)
public Response getValue() {
    long sleepMs = CalculationService.randomSleepMs();
    calculationService.calculate(sleepMs);
    return Response.ok("TWO").build();
}

The result is exactly what we want. Automatic request spans, plus meaningful application spans inside the same trace.

Infrastructure Setup: Configuring the OpenTelemetry Collector

WildFly exports traces over OTLP gRPC, so the collector needs a gRPC receiver on port 4317:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: "0.0.0.0:4317"
      http:
        endpoint: "0.0.0.0:4318"

The HTTP receiver on 4318 is still useful for other tools. The sample repository uses it for a small otel-cli test span.

Telemetry Parity: Scraping JVM Metrics via Prometheus Endpoints

Removing the Java agent also means removing agent-based JVM metrics collection.

Fortunately, WildFly already exposes JVM and server metrics in Prometheus format at the management endpoint:

http://localhost:9990/metrics

So the OpenTelemetry Collector can scrape that endpoint directly:

receivers:
  prometheus/wildfly:
    config:
      scrape_configs:
        - job_name: wildfly
          scrape_interval: 10s
          static_configs:
            - targets: ["localhost:9990"]
          metrics_path: /metrics

service:
  pipelines:
    metrics:
      receivers:
        - prometheus/wildfly

This gives us JVM metrics like heap, GC, threads, and class loading. It also gives us WildFly server metrics exposed through the management endpoint. Extra subsystem metrics depend on the statistics settings you enable in WildFly.

No extra Java agent is needed.

End Result: What Distributed Telemetry Looks Like

Once everything is wired up, the result becomes visible in the observability backend. This is the part that matters most. Not the XML. Not the properties file. The services finally show up as separate applications.

app1 and app2 shown as separate servicesapp1 and app2 shown as separate services

 

The same trace can still cross application boundaries. In this example, app1 calls app2, and the trace keeps its parent-child relationship.

single trace crossing app1 and app2single trace crossing app1 and app2

And if you rely on browser-to-backend correlation, the response still exposes the trace context through the Server-Timingheader.

server timing header with traceparentserver timing header with traceparent

That is the result we were after. Two WARs. One WildFly JVM. Two OpenTelemetry services. The application server stays as-is, but the telemetry finally reflects the applications inside it.

Deployment Variance: A Note on JBoss EAP Compatibility

This setup uses WildFly 39.0.1.Final, but the same technique applies to JBoss EAP 8.0+. EAP ships the same standalone-microprofile.xml profile with the microprofile-telemetry and opentelemetry subsystems pre-enabled, so the recipe is identical: start the server with -c standalone-microprofile.xml and put the otel.* properties in each WAR's microprofile-config.properties.

The Splunk agent GlobalOpenTelemetry conflict applies equally to EAP, for the same reason: a JVM-level agent that claims GlobalOpenTelemetry at startup prevents the server from creating per-deployment SDK instances.

Feature Trade-offs: What About Profiling?

The one Splunk-specific thing this setup does not replace is AlwaysOn Profiling.

If profiling is a hard requirement, the compatible agent-based option is to run each application in a separate WildFly or EAP instance. Then each JVM can own its own full agent setup.

For lightweight profiling without touching OpenTelemetry, JDK Flight Recorder is a separate option. It is built into the JDK and does not interact with the OTel SDK.

Implementation Roadmap: Wrapping Up

The default OpenTelemetry Java agent approach does not support separate service.name values for multiple applications inside one application server. The agent sees one JVM, so it creates one service identity.

But that does not mean the architecture is unobservable.

With WildFly's MicroProfile Telemetry + OpenTelemetry subsystems, each WAR gets its own SDK instance. With MicroProfile Config, each deployment gets its own service.name. With a small JAX-RS filter, we keep Server-Timingheaders. And with the collector scraping WildFly's /metrics endpoint, JVM metrics keep flowing without the agent.

So the final setup keeps the application server model intact:

WildFly JVM
+-- app1.war -> service.name=app1
`-- app2.war -> service.name=app2

That is the core takeaway. Even though the default Java agent path does not support this monitoring model, the MicroProfile profile, per-WAR config, CDI activation, a small response filter, and a collector metrics scrape give you the same operational outcome with the right service boundaries. The repository is there if you want the hands-on version with all files in place.

 

Don’t miss the next post. Here’s how to subscribe to this blog and get notified when new content goes live. 

Contributors
Get Updates on the Splunk Community!

Monitoring AI Agents with Splunk Observability Cloud

Let’s say I’m running a travel planning AI app in production. A user asks for three concise hotel options in ...

[Puzzles] Solve, Learn, Repeat: Tiling

This puzzle (first published here) is based on finding groups of tessellated tiles (inspired by floor tiles I ...

SOK it to Me: Top 3 Benefits of Using Splunk Operator on Kubernetes that’ll Make ...

    Thursday, July 9, 2026  |  11:00AM–12:00PM PDT Duration: 1 hour (includes Q&A) Managing can feel like a ...