Splunk Search

How to create an alert that adds the previous results to an hourly running total for a 24-hour period

Glasses
Builder

Hi,
I need to monitor "host failure events" per hour over last 24 hours for a group of 50 hosts. When the total reaches a threshold like 10 fails, an alert email needs to be sent. This count and total needs to occur each hour.

What I want to do is schedule a report to count the fails by each host per hour, save the count, and then add the next hourly count to the previous count. When any host reaches 10 fails within the 24 hour window, the triggered action needs to send an email.

At midnight, I would like to reset the count.

Any advice appreciated.

Thank you

0 Karma
1 Solution

to4kawa
Ultra Champion
index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

View solution in original post

0 Karma

to4kawa
Ultra Champion
index=foo "failed" 
| stats min(_time) as _time count by host 
| eval _time=strftime(_time,"%F %H%M")
| outputlookup append=t Failed_Count

It's better to add _time and use outputlookup with append=true.

For alerting:

| inputlookup Failed_Count
| where strptime(_time, "%F %H%M") > relative_time(now(),"-1d")
| stats sum(count) as total by host
| where total > 10

If event count > 0, fire alert.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Outputlookup

0 Karma

Glasses
Builder

WOW that is awesome!!!

I was going round and round not quite getting it... but that is exactly what I was trying to do...

although the system admin said that my default query would work as well - running every hour and sending an alert when results are greater than > 1

index=<foo> earliest=-24h@h latest=@h "<some bad failure msg>"  |bin _time span=1h |stats count by host _time |eventstats sum(count) as totalCount by host | where totalCount > 10    

one followup question, if I keep your outputlook method running, how do I purge the old data after a day or so, because the file might grow to a huge size and cause issues (I am thinking...)

Thank you very much !!!

0 Karma

to4kawa
Ultra Champion

https://splunkbase.splunk.com/app/1724/
or delete by script
or make another query to check and delete extra rows.

0 Karma

Glasses
Builder

thank you! please convert previous to an answer and I will accept

0 Karma

Glasses
Builder

I was thinking about using a summary index with a 24 hour look back each hour, but someone mentioned using an output lookup instead...

0 Karma

Glasses
Builder

So trying the outputlookup method...

I created a lookup called "Failed_Count" with a file.csv that contains 2 fields host,count.
I can run a query like this>>
index=foo "failed" |stats count by host | outputlookup Failed_Count
and it updates, but I have no luck adding the previous hour count to the total...

Any ideas?

0 Karma

Glasses
Builder

...|table fields from inputlookup add results from current search to table then | outputlookup...

I am guessing

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...