Solved: Re: How to create an alert that adds the previous ...

Glasses · ‎04-21-2020

Hi,
I need to monitor "host failure events" per hour over last 24 hours for a group of 50 hosts. When the total reaches a threshold like 10 fails, an alert email needs to be sent. This count and total needs to occur each hour.

What I want to do is schedule a report to count the fails by each host per hour, save the count, and then add the next hourly count to the previous count. When any host reaches 10 fails within the 24 hour window, the triggered action needs to send an email.

At midnight, I would like to reset the count.

Any advice appreciated.

Thank you

to4kawa · ‎04-21-2020

index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

View solution in original post

to4kawa · ‎04-21-2020

index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

Glasses · ‎04-22-2020

WOW that is awesome!!!

I was going round and round not quite getting it... but that is exactly what I was trying to do...

although the system admin said that my default query would work as well - running every hour and sending an alert when results are greater than > 1

index=<foo> earliest=-24h@h latest=@h "<some bad failure msg>"  |bin _time span=1h |stats count by host _time |eventstats sum(count) as totalCount by host | where totalCount > 10

one followup question, if I keep your outputlook method running, how do I purge the old data after a day or so, because the file might grow to a huge size and cause issues (I am thinking...)

Thank you very much !!!

to4kawa · ‎04-22-2020

https://splunkbase.splunk.com/app/1724/
or delete by script
or make another query to check and delete extra rows.

Glasses · ‎04-22-2020

thank you! please convert previous to an answer and I will accept

Glasses · ‎04-21-2020

I was thinking about using a summary index with a 24 hour look back each hour, but someone mentioned using an output lookup instead...

Glasses · ‎04-21-2020

So trying the outputlookup method...

I created a lookup called "Failed_Count" with a file.csv that contains 2 fields host,count.
I can run a query like this>>
index=foo "failed" |stats count by host | outputlookup Failed_Count
and it updates, but I have no luck adding the previous hour count to the total...

Any ideas?

Glasses · ‎04-21-2020

...|table fields from inputlookup add results from current search to table then | outputlookup...

I am guessing

How to create an alert that adds the previous results to an hourly running total for a 24-hour period

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!