I like to create alert for agent failure. The Alert has to be triggered when any of the splunk-Otel agent is failed to run in any of the host for particular time grain. Also need to create dashboards shows list of running and not running host name and agent version.
Hi,
There is a metric that could help you with this: otelcol_process_uptime
Also, there is a property of this metric: service_version, that will give you the version of the collector running.
Please check out Dashboards->built-in dashboard groups -> "OpenTelemetry Collector"
thanks,