I have a simple flask webhook running on my splunk server that is managed by supervisord. Since I'd like to know whether the supervisord process is running, I'm looking for a way to get splunk to call the
ps aux | grep supervisord | grep -v grep command and send an alert when there are no results. Is there a way to get splunk to do that, or are we looking at an alert that calls a python script that writes to a log file that is in turn indexed by splunk? Is there a way to get this process information into the
_introspection index by updating some config files? Before setting off on this journey I'd like to get some input from the experts!
Both @woodcock and @mmodestino have provided the answer. I'm just chiming in to help out to complete their answer that I think that you are looking for.
In addition to the *nix application, if you want to only have the times that the process isn't running (hopefully that will be less data than if you got all the instances of it running), then use a shell command like this:
if [ `ps ax | grep myprocess | grep -v grep | wc -l` -lt 1 ]; then echo "myprocess - Not running" ; fi
This will only output data if the process isn't running. Now, putting that into the *nix application shouldn't be too hard, but ask here if you need more information on doing that.
Have a look at scripted inputs and perhaps the splunk *nix TA for some inspiration!
nix TA has it's own version of a check on ps:
[script://./bin/ps.sh] interval = 30 sourcetype = ps source = ps index = os disabled = 1
You can simply alter that inputs.conf stanza and point it at your own
.sh and tell splunk how often to run it, what index/sourcetype to use. Splunk will ingest the output and you can analyze it from there.
I built something similar that checked for rsyslog by using pgreg loosely based on this: