Alert if last log message is X in last 30 minutes

greekleo89 · ‎04-28-2022

Hi All,

We are monitoring the same log file from multiple hosts and we have observed that when a particular error gets logged the service of that machine stops, when this happens there is nothing else logged in the log file but the error, The machine will try automatically to bring up the service, and if it does so successfully then other normal logs will follow.

Aim:
Our aim is to capture this particular error but only alert if that error is the last entry on this log file in the last 30 minutes or so.

Any help on this would be greatly appreciated.

For arguments sake the error looks like this:
***ERROR*** Exception occurred in serviceB_TDR

ITWhisperer · ‎04-28-2022

Assuming events are returned newest first, one way to do this would be:

| head 1
| search "***ERROR*** Exception occurred in serviceB_TDR"

greekleo89 · ‎04-28-2022

By that you mean events in splunks indexer right?

As as per the norm the newest event/entry is always at the bottom of the log file.

Thanks

ITWhisperer · ‎04-28-2022

I mean when you do a search in Splunk, which event comes back first - head 1 will just keep the first event in the pipeline.

greekleo89 · ‎04-28-2022

Cool i got you now - i need to make sure however that another event from another logfile isn't "counted" per se so for this i suppose i would just do a stats c by host so that the results are unique per host right?

All i am saying is that since we are getting these logs which could be duplicates from multiple hosts all in at the same time, if i do head 1 lets say for the last 30 mins, what would happen if one machine is down but the others are logging as normal, the error that i am looking for will not be the first in the pipeline as other normal messages from the other hosts would be?

ITWhisperer · ‎04-28-2022

| stats latest(_raw) as _raw latest(_time) as _time by host
| search "***ERROR*** Exception occurred in serviceB_TDR"

greekleo89 · ‎04-28-2022

Thank you very much for the reply.

These logs come in every minute so with the above search what would happen in the following scenario:

Search runs every 30 minutes (as an example 00:00) and looks at the last 30 (23:30-00:00) , and it sees that in the last 30 minutes for arguments sake at 23:59 that error appears and its the last line written - this will fire the alert however the logic i want to apply is that if it is the last message and it has been the last message for 30 minutes?

ITWhisperer · ‎04-28-2022

Yes, your alert needs to check the value of _time (which is why it is included in the stats) to check how long ago it was.

If your alert is only running every half hour, the timeframe for your search should include the past hour.

greekleo89 · ‎04-28-2022

Great thank you.

greekleo89 · ‎04-28-2022

The full log line is : 04/11/2022 17:47:58.846593 [Machine1] ***ERROR*** Exception occurred in serviceB_TDR

Alert if last log message is X in last 30 minutes

alert condition

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!