Splunk Search

How to create an alert that adds the previous results to an hourly running total for a 24-hour period

Communicator

Hi,
I need to monitor "host failure events" per hour over last 24 hours for a group of 50 hosts. When the total reaches a threshold like 10 fails, an alert email needs to be sent. This count and total needs to occur each hour.

What I want to do is schedule a report to count the fails by each host per hour, save the count, and then add the next hourly count to the previous count. When any host reaches 10 fails within the 24 hour window, the triggered action needs to send an email.

At midnight, I would like to reset the count.

Any advice appreciated.

Thank you

0 Karma
1 Solution

SplunkTrust
SplunkTrust
index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

View solution in original post

0 Karma

SplunkTrust
SplunkTrust
index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

View solution in original post

0 Karma

Communicator

WOW that is awesome!!!

I was going round and round not quite getting it... but that is exactly what I was trying to do...

although the system admin said that my default query would work as well - running every hour and sending an alert when results are greater than > 1

index=<foo> earliest=-24h@h latest=@h "<some bad failure msg>"  |bin _time span=1h |stats count by host _time |eventstats sum(count) as totalCount by host | where totalCount > 10    

one followup question, if I keep your outputlook method running, how do I purge the old data after a day or so, because the file might grow to a huge size and cause issues (I am thinking...)

Thank you very much !!!

0 Karma

SplunkTrust
SplunkTrust

https://splunkbase.splunk.com/app/1724/
or delete by script
or make another query to check and delete extra rows.

0 Karma

Communicator

thank you! please convert previous to an answer and I will accept

0 Karma

Communicator

I was thinking about using a summary index with a 24 hour look back each hour, but someone mentioned using an output lookup instead...

0 Karma

Communicator

So trying the outputlookup method...

I created a lookup called "Failed_Count" with a file.csv that contains 2 fields host,count.
I can run a query like this>>
index=foo "failed" |stats count by host | outputlookup Failed_Count
and it updates, but I have no luck adding the previous hour count to the total...

Any ideas?

0 Karma

Communicator

...|table fields from inputlookup add results from current search to table then | outputlookup...

I am guessing

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!