Alerting

checking if a a process is still running (time based comparison of event counts)

Explorer

Hello,

I want to check if a process is still running. The process is logging periodically a short info on polling a directory.
Now, I will to use that info to detect if the process is not running anymore.
Normally, a simple search for the info would be:

search myprocess is doing the work

and then defining the alert for eventcount = 0 would do the job. But...
I would like to do it in a generic way, that means, that "is doing the work" is actually unknown. This has to work with every process which is logging something periodically.

Now, I have a defined a search doing a simple event count over time periods (hourly):

source=/logs/processes.log earliest=-3h@h | chart count over process by _time span=1h 

this give me a table like this:

process    1381651200    1381654800    1381658400
myproc1    12            12            1
myproc2    15            15            0
myproc3    233           243           102

well, that means that the process "myproc2" did not log anything in the last hour and I need an alert for that issue.

What I've tried is to extract somehow the last column and evaluate the apopriate process name (process column) but I couldn't get it done.

Any Ideas hot to do that?
Maybe I'm thinking too complicated?

Regards,
Peter

0 Karma

Explorer

after hour of reading several postings here finally I found a solution for this problem.

my problem is nothing else as determining the gap between the last log info and now. Well, this search does it for me now:

source=/logs/processes.log earliest=-2h | stats max(_time) As LatestTime by process | eval Gap=(now()-LatestTime) | search Gap>3600

means: every process what was "seen" in the last 2 hours must be "seen" during the last hour also. If not I can raise an alert now!

0 Karma