Splunk Search

How to send a recovery alert only when there is a corresponding alert?

anasar
New Member

Hi,

we have many indexes like server and core. and we have a lookup table having two columns: exception and threshold. So the requirements is: to search for all exceptions in each indexes and if any exception from the lookup table is found in the index and the count is equal to or greater than the threshold value from the lookup, then alert.

After 5 mins, it should search again and if the exceptions are not happening in the index, then send recovery alert only for those events which we already sent an alert. So intention here is send recovery alert only when there was an corresponding alert.

While alerting, we need to specify which exception happened, how many events(should be grater than threshold), source ip, etc.. The recovery alert should also have same info.

Eg

Look up table: exception.csv
exceptions,threshold
OutOfMemoryError,1
ORA-1112, 5
JVMExceptions, 2
etc.   
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi anasar,
if exceptions are in a field they are easier to manage:

your_search [ | inputlookup  exception.csv | fields exceptions ] 
| stats count by exception 
| lookup  exception.csv exceptions OUTPUT threshold 
| where count > threshold 
| table exceptions count threshold 

If instead (as I think you have) exceptions are strings in your events it's less easy!

your_search [ | inputlookup  exception.csv | rename exceptions AS query | fields query ] 
| rename _raw as rawText
| eval foo=[
   | inputlookup exception.csv 
   | eval query="%"+exception+"%" 
   | stats values(query) AS query 
   | eval query=mvjoin(query,",") 
   | fields query 
   | format "" "" "" "" "" ""
   ]
| eval foo=split(foo,",") 
| mvexpand foo 
| where like(rawText,foo)
| lookup  exception.csv exceptions OUTPUT threshold 
| stats count by exceprions
| where count > threshold 
| table exceptions count threshold 

This solve the first requirement but I'm not sure about the second one: "After 5 mins, search again and if the exceptions are not happening in the index, then send recovery alert only for those events we already sent an alert. So intention here is send recovery alert only when there was a corresponding alert."

Bye.
Giuseppe

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi anasar,
if exceptions are in a field they are easier to manage:

your_search [ | inputlookup  exception.csv | fields exceptions ] 
| stats count by exception 
| lookup  exception.csv exceptions OUTPUT threshold 
| where count > threshold 
| table exceptions count threshold 

If instead (as I think you have) exceptions are strings in your events it's less easy!

your_search [ | inputlookup  exception.csv | rename exceptions AS query | fields query ] 
| rename _raw as rawText
| eval foo=[
   | inputlookup exception.csv 
   | eval query="%"+exception+"%" 
   | stats values(query) AS query 
   | eval query=mvjoin(query,",") 
   | fields query 
   | format "" "" "" "" "" ""
   ]
| eval foo=split(foo,",") 
| mvexpand foo 
| where like(rawText,foo)
| lookup  exception.csv exceptions OUTPUT threshold 
| stats count by exceprions
| where count > threshold 
| table exceptions count threshold 

This solve the first requirement but I'm not sure about the second one: "After 5 mins, search again and if the exceptions are not happening in the index, then send recovery alert only for those events we already sent an alert. So intention here is send recovery alert only when there was a corresponding alert."

Bye.
Giuseppe

0 Karma

anasar
New Member

Thank you cusello. It works.

0 Karma

anasar
New Member

Planning to use summary index to save alerts and use it for recovery alerts. But I'm not getting other howto parts clearly. Please help.

0 Karma

anasar
New Member

also I need to add one more column in lookup file. severity. which says whether the exception is a warning(1) or critical(2). Hence while sending mail we need to see the severity and alert subject will be "Critical Problem alert" or "Warning Problem alert ...". The recovery alert "Recovery ...." don't need to refer the severity field. Hence the exception.csv will look like:

exceptions, threshold, severity
OutOfMemoryError, 1, 2
ORA-1114, 5, 1
JVMExceptions, 10, 2
etc.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...