Splunk Search

How to create an alert that adds the previous results to an hourly running total for a 24-hour period

Glasses
Contributor

Hi,
I need to monitor "host failure events" per hour over last 24 hours for a group of 50 hosts. When the total reaches a threshold like 10 fails, an alert email needs to be sent. This count and total needs to occur each hour.

What I want to do is schedule a report to count the fails by each host per hour, save the count, and then add the next hourly count to the previous count. When any host reaches 10 fails within the 24 hour window, the triggered action needs to send an email.

At midnight, I would like to reset the count.

Any advice appreciated.

Thank you

0 Karma
1 Solution

to4kawa
Ultra Champion
index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

View solution in original post

0 Karma

to4kawa
Ultra Champion
index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

View solution in original post

0 Karma

Glasses
Contributor

WOW that is awesome!!!

I was going round and round not quite getting it... but that is exactly what I was trying to do...

although the system admin said that my default query would work as well - running every hour and sending an alert when results are greater than > 1

index=<foo> earliest=-24h@h latest=@h "<some bad failure msg>"  |bin _time span=1h |stats count by host _time |eventstats sum(count) as totalCount by host | where totalCount > 10    

one followup question, if I keep your outputlook method running, how do I purge the old data after a day or so, because the file might grow to a huge size and cause issues (I am thinking...)

Thank you very much !!!

0 Karma

to4kawa
Ultra Champion

https://splunkbase.splunk.com/app/1724/
or delete by script
or make another query to check and delete extra rows.

0 Karma

Glasses
Contributor

thank you! please convert previous to an answer and I will accept

0 Karma

Glasses
Contributor

I was thinking about using a summary index with a 24 hour look back each hour, but someone mentioned using an output lookup instead...

0 Karma

Glasses
Contributor

So trying the outputlookup method...

I created a lookup called "Failed_Count" with a file.csv that contains 2 fields host,count.
I can run a query like this>>
index=foo "failed" |stats count by host | outputlookup Failed_Count
and it updates, but I have no luck adding the previous hour count to the total...

Any ideas?

0 Karma

Glasses
Contributor

...|table fields from inputlookup add results from current search to table then | outputlookup...

I am guessing

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!