we have many indexes like server and core. and we have a lookup table having two columns: exception and threshold. So the requirements is: to search for all exceptions in each indexes and if any exception from the lookup table is found in the index and the count is equal to or greater than the threshold value from the lookup, then alert.
After 5 mins, it should search again and if the exceptions are not happening in the index, then send recovery alert only for those events which we already sent an alert. So intention here is send recovery alert only when there was an corresponding alert.
While alerting, we need to specify which exception happened, how many events(should be grater than threshold), source ip, etc.. The recovery alert should also have same info.
Look up table: exception.csv exceptions,threshold OutOfMemoryError,1 ORA-1112, 5 JVMExceptions, 2 etc.
Planning to use summary index to save alerts and use it for recovery alerts. But I'm not getting other howto parts clearly. Please help.
also I need to add one more column in lookup file. severity. which says whether the exception is a warning(1) or critical(2). Hence while sending mail we need to see the severity and alert subject will be "Critical Problem alert" or "Warning Problem alert ...". The recovery alert "Recovery ...." don't need to refer the severity field. Hence the exception.csv will look like:
exceptions, threshold, severity
OutOfMemoryError, 1, 2
ORA-1114, 5, 1
JVMExceptions, 10, 2
if exceptions are in a field they are easier to manage:
your_search [ | inputlookup exception.csv | fields exceptions ] | stats count by exception | lookup exception.csv exceptions OUTPUT threshold | where count > threshold | table exceptions count threshold
If instead (as I think you have) exceptions are strings in your events it's less easy!
your_search [ | inputlookup exception.csv | rename exceptions AS query | fields query ] | rename _raw as rawText | eval foo=[ | inputlookup exception.csv | eval query="%"+exception+"%" | stats values(query) AS query | eval query=mvjoin(query,",") | fields query | format "" "" "" "" "" "" ] | eval foo=split(foo,",") | mvexpand foo | where like(rawText,foo) | lookup exception.csv exceptions OUTPUT threshold | stats count by exceprions | where count > threshold | table exceptions count threshold
This solve the first requirement but I'm not sure about the second one: "After 5 mins, search again and if the exceptions are not happening in the index, then send recovery alert only for those events we already sent an alert. So intention here is send recovery alert only when there was a corresponding alert."