Splunk Observability Cloud

otel operator endpoint invalid - Auto instrumentation broken?

nickhills_encc
Observer

On a new fresh deployment of O11y, we are following the guide from the setup wizard and running the following helm install command

 

helm install splunk-otel-collector --set="cloudProvider=aws,distribution=eks,splunkObservability.accessToken=xxxx-xxxxx,clusterName=eks-test-cluster,splunkObservability.realm=eu2,gateway.enabled=true,splunkPlatform.endpoint=https://my-splunk-cloud-hec-input,splunkPlatform.token=my-token,splunkObservability.profilingEnabled=true,environment=test,operator.enabled=true,agent.discovery.enabled=true" splunk-otel-collector-chart/splunk-otel-collector

 

However this fails with the following error:

Error: INSTALLATION FAILED: template: splunk-otel-collector/templates/operator/instrumentation.yaml:2:4: executing "splunk-otel-collector/templates/operator/instrumentation.yaml" at <include "splunk-otel-collector.operator.validation-rules" .>: error calling include: template: splunk-otel-collector/templates/operator/_helpers.tpl:17:13: executing "splunk-otel-collector.operator.validation-rules" at <.Values.instrumentation.exporter.endpoint>: nil pointer evaluating interface {}.endpoint

This seems to be because  _helpers.tpl is expecting a value for instrumentation.exporter.endpoint however the value according to the chart (and the documentation) is instrumentation.endpoint

https://github.com/signalfx/splunk-otel-collector-chart/blob/main/helm-charts/splunk-otel-collector/...
Line 13 is where it is mentioned.

We have tried providing instrumentation.exporter.endpoint as an additional parameter - but instead get the error:

Values don't meet the specifications of the schema(s) in the following chart(s):
splunk-otel-collector:
- instrumentation: Additional property exporter is not allowed

 (Which is true - instrumentation.exporter.endpoint is not defined in here: https://github.com/signalfx/splunk-otel-collector-chart/blob/main/helm-charts/splunk-otel-collector/... )
line 20

We also get the same error if we provide a complete values.yaml file with both formats of the instrumentation endpoint defined. 
 
It looks like _helpers.tpl was edited to include this endpoint specification about a month ago, so surely we can not be the first people to be tripped up by this?



Is there anything else I can try or do we need to wait for the operator to be fixed?
 
0 Karma

nickhills_encc
Observer

Just to follow up that if we disable the operator, the deployment is successful, but we have no APM.

This issue specifically seems to relate to the operator and the APM instrumentation

0 Karma

ramkumarvasu
Loves-to-Learn

Yes, same issue when I tried. Only when disabling the operator flag, the deployment goes through

0 Karma

hdjlassi
Splunk Employee
Splunk Employee

Welcome to Splunk O11y Cloud Nickhills !

 

From your first message, I see that you are deploying the collector as Gateway (on the helm install command, parameter gateway.enabled is set to true). Can you confirm that you need to setup OpenTelemetry as gateway? If not, please try to re-install as agent and let us know how it goes.

 

Regards,

Houssem

Tags (2)

nickhills_encc
Observer
helm install -n splunk --create-namespace splunk-otel-collector --set="cloudProvider=aws,distribution=eks,splunkObservability.accessToken=xxxxxx,clusterName=eks-uk-test,splunkObservability.realm=eu2,gateway.enabled=false,splunkPlatform.endpoint=xxxxxxx,splunkPlatform.token=xxxxx,splunkObservability.profilingEnabled=true,environment=test,operator.enabled=true,agent.discovery.enabled=true" splunk-otel-collector-chart/splunk-otel-collector

Still gives:

Error: INSTALLATION FAILED: template: splunk-otel-collector/templates/operator/instrumentation.yaml:2:4: executing "splunk-otel-collector/templates/operator/instrumentation.yaml" at <include "splunk-otel-collector.operator.validation-rules" .>: error calling include: template: splunk-otel-collector/templates/operator/_helpers.tpl:17:13: executing "splunk-otel-collector.operator.validation-rules" at <.Values.instrumentation.exporter.endpoint>: nil pointer evaluating interface {}.endpoint

0 Karma

nickhills_encc
Observer

We did eventually resolve this, however it took multiple steps. 

Indeed, we were using an old version of helm, updating to 3.16 did allow us to make further progress, however that moved the issue onto a compatibility/dependency issue with prometheus (specifically prometheus-operator)

Switching to the latest otel chart (--version 0.112.0) was one step closer - however there was a breaking change in the values.yaml between 110-112 which meant we needed to rewrite our local values file.

Long story short: helm 3.16 + collector & values 0.112.0 worked for us.

 

 

0 Karma

ramkumarvasu
Loves-to-Learn

Can i know what were the changes down on values file? Otel chart I was able to get in the Github project

0 Karma

bishida
Splunk Employee
Splunk Employee

Hi,

This error is seems to occur with older versions of helm. Can you please confirm your version of helm and see if it's possible to update to a current version?

0 Karma

ramkumarvasu
Loves-to-Learn

Helm version.BuildInfo{Version:"v3.14.2", GitCommit:"", GitTreeState:"clean", GoVersion:"go1.22.7"}

I used helm from Azure cloudshell and also tried GCP cloudshell. Both had similar issue. Do I need to try installing kubectl and helm locally and try again?

0 Karma

ramkumarvasu
Loves-to-Learn

I also tried with the latest version 3.16.3 and it is still the same issue

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...