We are monitoring the same log file from multiple hosts and we have observed that when a particular error gets logged the service of that machine stops, when this happens there is nothing else logged in the log file but the error, The machine will try automatically to bring up the service, and if it does so successfully then other normal logs will follow.
Our aim is to capture this particular error but only alert if that error is the last entry on this log file in the last 30 minutes or so.
Any help on this would be greatly appreciated.
For arguments sake the error looks like this:
***ERROR*** Exception occurred in serviceB_TDR
Cool i got you now - i need to make sure however that another event from another logfile isn't "counted" per se so for this i suppose i would just do a stats c by host so that the results are unique per host right?
All i am saying is that since we are getting these logs which could be duplicates from multiple hosts all in at the same time, if i do head 1 lets say for the last 30 mins, what would happen if one machine is down but the others are logging as normal, the error that i am looking for will not be the first in the pipeline as other normal messages from the other hosts would be?
Thank you very much for the reply.
These logs come in every minute so with the above search what would happen in the following scenario:
Search runs every 30 minutes (as an example 00:00) and looks at the last 30 (23:30-00:00) , and it sees that in the last 30 minutes for arguments sake at 23:59 that error appears and its the last line written - this will fire the alert however the logic i want to apply is that if it is the last message and it has been the last message for 30 minutes?
Yes, your alert needs to check the value of _time (which is why it is included in the stats) to check how long ago it was.
If your alert is only running every half hour, the timeframe for your search should include the past hour.