Splunk Search

How to send a recovery alert only when there is a corresponding alert?

anasar
New Member

Hi,

we have many indexes like server and core. and we have a lookup table having two columns: exception and threshold. So the requirements is: to search for all exceptions in each indexes and if any exception from the lookup table is found in the index and the count is equal to or greater than the threshold value from the lookup, then alert.

After 5 mins, it should search again and if the exceptions are not happening in the index, then send recovery alert only for those events which we already sent an alert. So intention here is send recovery alert only when there was an corresponding alert.

While alerting, we need to specify which exception happened, how many events(should be grater than threshold), source ip, etc.. The recovery alert should also have same info.

Eg

Look up table: exception.csv
exceptions,threshold
OutOfMemoryError,1
ORA-1112, 5
JVMExceptions, 2
etc.   
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi anasar,
if exceptions are in a field they are easier to manage:

your_search [ | inputlookup  exception.csv | fields exceptions ] 
| stats count by exception 
| lookup  exception.csv exceptions OUTPUT threshold 
| where count > threshold 
| table exceptions count threshold 

If instead (as I think you have) exceptions are strings in your events it's less easy!

your_search [ | inputlookup  exception.csv | rename exceptions AS query | fields query ] 
| rename _raw as rawText
| eval foo=[
   | inputlookup exception.csv 
   | eval query="%"+exception+"%" 
   | stats values(query) AS query 
   | eval query=mvjoin(query,",") 
   | fields query 
   | format "" "" "" "" "" ""
   ]
| eval foo=split(foo,",") 
| mvexpand foo 
| where like(rawText,foo)
| lookup  exception.csv exceptions OUTPUT threshold 
| stats count by exceprions
| where count > threshold 
| table exceptions count threshold 

This solve the first requirement but I'm not sure about the second one: "After 5 mins, search again and if the exceptions are not happening in the index, then send recovery alert only for those events we already sent an alert. So intention here is send recovery alert only when there was a corresponding alert."

Bye.
Giuseppe

View solution in original post

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi anasar,
if exceptions are in a field they are easier to manage:

your_search [ | inputlookup  exception.csv | fields exceptions ] 
| stats count by exception 
| lookup  exception.csv exceptions OUTPUT threshold 
| where count > threshold 
| table exceptions count threshold 

If instead (as I think you have) exceptions are strings in your events it's less easy!

your_search [ | inputlookup  exception.csv | rename exceptions AS query | fields query ] 
| rename _raw as rawText
| eval foo=[
   | inputlookup exception.csv 
   | eval query="%"+exception+"%" 
   | stats values(query) AS query 
   | eval query=mvjoin(query,",") 
   | fields query 
   | format "" "" "" "" "" ""
   ]
| eval foo=split(foo,",") 
| mvexpand foo 
| where like(rawText,foo)
| lookup  exception.csv exceptions OUTPUT threshold 
| stats count by exceprions
| where count > threshold 
| table exceptions count threshold 

This solve the first requirement but I'm not sure about the second one: "After 5 mins, search again and if the exceptions are not happening in the index, then send recovery alert only for those events we already sent an alert. So intention here is send recovery alert only when there was a corresponding alert."

Bye.
Giuseppe

0 Karma

anasar
New Member

Thank you cusello. It works.

0 Karma

anasar
New Member

Planning to use summary index to save alerts and use it for recovery alerts. But I'm not getting other howto parts clearly. Please help.

0 Karma

anasar
New Member

also I need to add one more column in lookup file. severity. which says whether the exception is a warning(1) or critical(2). Hence while sending mail we need to see the severity and alert subject will be "Critical Problem alert" or "Warning Problem alert ...". The recovery alert "Recovery ...." don't need to refer the severity field. Hence the exception.csv will look like:

exceptions, threshold, severity
OutOfMemoryError, 1, 2
ORA-1114, 5, 1
JVMExceptions, 10, 2
etc.

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

This puzzle (first published here) is based on matching timestamps to cron expressions.All the timestamps ...

Design, Compete, Win: Submit Your Best Splunk Dashboards for a .conf26 Pass

Hello Splunkers,  We’re excited to kick off a Splunk Dashboard contest! We know that dashboards are a primary ...

May 2026 Splunk Expert Sessions: Security & Observability

Level Up Your Operations: May 2026 Splunk Expert Sessions Whether you are refining your security posture or ...