All Apps and Add-ons

check process is still running on each server


I'm trying to figure out a way to verify that a process is running on each server. We have a clustered environment with 4 servers running WebSphere application server and as such I want to confirm that the cluster is healthy at the process level.

This is the closest I've gotten:

 sourcetype="ps" "SOME_UNIQUE_PROCESS_NAME" | timechart count by host

Pushing this into an area graph gets close except during some intervals the ps source isn't sampled and therefore returns 0 results instead of 1. It also isn't very robust as as it relies on the span of each sample of the timechart.

I'm using the Splunk for Unix and Linux app which is where the sourcetype of 'ps' comes from.


I think I figured it out, could someone validate if this is the most efficient way to do this:

 sourcetype="ps" | eval processexists=if(match(_raw, "SOME_UNIQUE_PROCESS_NAME"), 1, 0) | timechart span=1m avg(processexists) by host

As long as you know you're going to get more than one event from ps within the span of 1 minute it should work. The only thing that could get dodgy is it'll drop down to 0.5 between going from 1 and 0 if you are sampling every 30 seconds.

0 Karma
Get Updates on the Splunk Community!

BSides Splunk 2022 - The Call for Papers is now Open!

TLDR; Main Site: CFP Site: CFP Opens: December 15th, ...

Sending Metrics to Splunk Enterprise With the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. The OpenTelemetry project is the second largest ...

What's New in Splunk Cloud Platform 9.0.2208?!

Howdy!  We are happy to share the newest updates in Splunk Cloud Platform 9.0.2208! Analysts can benefit ...