I have a requirement to implement process and service monitoring on a legacy Windows platform via Splunk. I already have a tasklist batch file running every 5 minutes on each server outputting to a log file that is being sent to the indexer, which gives me a complete list of all processes and services running. The saved search to check that "process X" is running on at least one server in the clustered pair is easy enough to create (e.g. host="servername*" source="*process_list_log" "Process X" startminutesago=5). If all is OK, I should get a result from one server in the pair. What I need to do is generate an alert condition if the number of matches is less than 1, meaning that "process X" isn't running on either server. My guess is that I'll need to do some field comparisons, but that's where things get a bit unclear for me. Any help is appreciated.
What I forgot to mention is that some processes are dependent on others. For instance, process X only runs when the server is in "primary" mode. When primary, process y and Z should also be running on that server.
If the number you speak of is events, you could use a simple alert condition on the saved search - "if number of events is less than 2" for example. If it'll always be one event, but it should have multiple values in it, you could either add a custom condition search to the saved search alert, or add an additional " | stats count()" to the end of your search and do an alert off of that.
Related/Similar question: http://answers.splunk.com/questions/8764/monitoring-file/8765#8765