Ok, so here is my question
These 3 lines denote possible values for scheduled downtime
MSG WHEN DOWNTIME START
STARTED
SERVICE DOWNTIME ALERT: afakeci;Service Schedule;STARTED; Service has entered a period of scheduled downtime
CANCELLED
SERVICE DOWNTIME ALERT: afakeci;Service Schedule;CANCELLED; Scheduled downtime for service has been cancelled.
STOPPED
SERVICE DOWNTIME ALERT: afakeci;Service Schedule;STOPPED; Service has exited from a period of scheduled downtime
Then i have my SPL looking for something
index="test" earliest=-1h
AND
(
( eventname="SERVICE NOTIFICATION" AND status_code=CRITICAL ) OR
( eventname="PASSIVE SERVICE CHECK" AND state=2 )
AND ( service_name="CPU utilization" OR
service_name="Memory and pagefile*" OR
service_name="Filesystem*" OR
service_name="Service*" OR
service_name="File*") )
OR
( eventname="HOST NOTIFICATION" AND status_code="DOWN" )
| eval service_name=if( eventname=="HOST NOTIFICATION" , "ICMP request (PING)" , service_name )
| eval status_code=if( eventname=="PASSIVE SERVICE CHECK" , "CRITICAL" , status_code )
| eval nagios_timestamp=timestamp
| convert ctime(nagios_timestamp)
| eval alert_seed=status_code.".".host_name.".".service_name
What i want to do is:
IF my SPL found a event then
Execute second query to find during the last hour at least one ocurrence of downtime with status != of STARTED:
index="test" host_name=afakeci earliest=-1 eventname="SERVICE DOWNTIME ALERT" ( "CANCELLED" OR "STOPPED" ) | stats count
In this case means that maintanence mode finished and the alert is real then i can raise alert
ELSE
Finish the execution
I was checking for cmds "transaction", "search" ... and did not found a solution yet =(
... View more