All Apps and Add-ons

Kafka-JMX metrics are not visible in Signalfx after exporting successfully from splunk-otel-collector.

aashoksi_cisco
Splunk Employee
Splunk Employee

Hi,

I have used kafka-jmx metrics receiver in splunk-otel-collector and now getting the logs in splunk otel colllector that shows the metrics(kafkaJmx+JVM) are exported successfully to signalfx but those metrics are not getting visible in Signalfx. In the Signalfx chart, getting 0 timeseries for all JMX metrics. To verify the same checked in Signalfx usage-analytics where all these metrics are showing but not able to get any data in chart itself. Please find the below logs where its showing some of metrics. In the log, there is one metrics that are not a jmx metrics "queueSize" this metrics we are getting but other are not being fetched.
I would appreciate if anyone suggest or put inout to resolve these issue.
As per log, its clear that Splunk-otel-collectoer is doing its job correctly but not getting in Signalfx due to some unknown issue at signalfx side.



Logs:
---------------------------------------------------------------------------------------------------
ScopeMetrics #0
ScopeMetrics SchemaURL:
InstrumentationScope io.opentelemetry.sdk.logs
Metric #0
Descriptor:
-> Name: queueSize
-> Description: The number of items queued
-> Unit: 1
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> processorType: Str(BatchLogRecordProcessor)
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 0
ScopeMetrics #1
ScopeMetrics SchemaURL:
InstrumentationScope io.opentelemetry.contrib.jmxmetrics 1.48.0-alpha
Metric #0
Descriptor:
-> Name: kafka.request.time.avg
-> Description: The average time the broker has taken to service requests
-> Unit: ms
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> type: Str(produce)
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 0.000000
Metric #1
Descriptor:
-> Name: jvm.memory.pool.init
-> Description: current memory pool usage
-> Unit: By
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> name: Str(CodeHeap 'non-profiled nmethods')
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 2555904
Metric #4
Descriptor:
-> Name: kafka.max.lag
-> Description: Max lag in messages between follower and leader replicas
-> Unit: {message}
-> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 0
Metric #5
Descriptor:
-> Name: kafka.partition.under_replicated
-> Description: The number of under replicated partitions
-> Unit: {partition}
-> DataType: Gauge
NumberDataPoints #0
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 0
Metric #6
Descriptor:
-> Name: kafka.request.time.50p
-> Description: The 50th percentile time the broker has taken to service requests
-> Unit: ms
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> type: Str(produce)
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
-> type: Str(fetchconsumer)
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 0.000000
NumberDataPoints #2
Data point attributes:
-> type: Str(fetchfollower)
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 500.000000
Metric #7
Descriptor:
-> Name: kafka.purgatory.size
-> Description: The number of requests waiting in purgatory
-> Unit: {request}
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> type: Str(fetch)
StartTimestamp: 2025-12-30 18:12:56.077595 +0000 UTC
Timestamp: 2026-01-01 18:35:56.209216 +0000 UTC
Value: 6135

0 Karma

aashoksi_cisco
Splunk Employee
Splunk Employee

Hi Wander,
Thanks for the inputs!

I thought about the same thing like to normalize the dimension which i did already. As per below detail, getting the same resource attributes for each JVM and JMX metrics. These are already verified in signalfx usage-analytics however its not visible in the metrics list. Moreover, tried with other option like chart with gauge but no luck.


Dimensions:

-> service.name:
-> telemetry.sdk.language:
-> telemetry.sdk.name:
-> telemetry.sdk.version:
-> cloud.provider:
-> cloud.platform:
-> host.id:
-> cloud.availability_zone:
-> cloud.region:
-> cloud.account.id:
-> host.image.id:
-> host.type:
-> host.name:
-> os.type:
-> k8s.pod.ip:
-> k8s.pod.name:
-> k8s.pod.uid:
-> k8s.namespace.name:
-> deployment.environment:
-> otelcol.service.mode:
0 Karma

Wander
Explorer

Thanks for the screenshot. That helps a lot.

I'm thinking that the metric is ingesting fine and SignalFx is creating time series, but they’re likely keyed on high-cardinality Kubernetes dimensions, such as pods or pod UID,  instead of a stable Kafka identity. That’s why Usage Analytics shows activity but charts come back empty.

How about try adding a stable Kafka dimension for example: kafka.cluster, broker.id, or broker.name. That way you're not relying on k8s.pod.uid as the primary identity. Then, in the chart, explicitly group by the Kafka dimension and use mean or max.

Definitely locked into finding out if this helps or not.

0 Karma

aashoksi_cisco
Splunk Employee
Splunk Employee

I have tried with stable dimension also but still no luck.

Regarding this point - "One more thing to note: the Kafka JMX receiver you’re using is still alpha. It’s known to emit inconsistent metadata for some MBeans, which makes this behavior more likely."

Could you please tell me how we can use beta or stable receiver since as per the doc it comes with splunk-otel-collector release itself and even in latest version of otel-collector that is V-0.141.0 the receiver is "1.52.0-alpha"?

Current detail with V-0.134.0
InstrumentationScope io.opentelemetry.contrib.jmxmetrics 1.48.0-alpha

Thanks!

0 Karma

Wander
Explorer

This is a fun one, and your logs actually show the collector is doing its job.

Since the metrics show up in Usage Analytics, SignalFx is receiving them. When charts show “0 time series,” it’s usually a dimensions problem, not an ingestion problem.

Kafka JMX metrics often come in with attributes like type=produce or type=fetchconsumer, but they’re missing stable identity fields like host.name, service.name, or a cluster identifier. SignalFx needs a consistent set of dimensions to form a time series. If those change between scrapes, the metric exists but won’t chart.

A second gotcha is that many of these metrics are reporting 0 or constant values. If you’re using rate or delta functions, or the wrong rollup, the chart can look empty. Make sure you’re treating them as gauges and using something simple like “Last value” or “Mean.”

One more thing to note: the Kafka JMX receiver you’re using is still alpha. It’s known to emit inconsistent metadata for some MBeans, which makes this behavior more likely.

That’s why queueSize works. It’s a simple internal metric with clean, stable dimensions. The Kafka and JVM JMX metrics need a bit of normalization first.

Try this:
-Add explicit service.name, host.name, and a Kafka cluster tag in the OTel Collector
-Normalize dimensions so every datapoint looks the same
-Chart with gauge semantics, not rates

Get Updates on the Splunk Community!

New Year, New Changes for Splunk Certifications

As we embrace a new year, we’re making a small but important update to the Splunk Certification ...

Stay Connected: Your Guide to January Tech Talks, Office Hours, and Webinars!

What are Community Office Hours? Community Office Hours is an interactive 60-minute Zoom series where ...

[Puzzles] Solve, Learn, Repeat: Reprocessing XML into Fixed-Length Events

This challenge was first posted on Slack #puzzles channelFor a previous puzzle, I needed a set of fixed-length ...