I have a current alert that is working as expected to capture a log event that states a service is down. We have started to receive many false positives on this because the service automatically recovers in a matter of seconds. I would like to change the alert so that instead of immediately sending a notification, it will pause for 30 seconds and search for a recovery event and only send the notification if that recovery is not found.
edit:
index=networklogs host=foo10* OR host=foo11* AND ("member" AND "monitor status down")
|rex "monitor status\s+(?<State>\w+)"
|rex "member /Common/(?<trpHost>[^:]+):53"
|eval Identifier=trpHost + "dropped out of the VIP pool"
|eval Summary="Critical Infrastructure - Server dropped out of the VIP pool. Pool member is " + State + "."
|eval ProcessID="foo"
|eval Severity=if(
State=="down",
5,
1
)
| eval Type=if(State=="down",
1,
2
)
|eval OwnerGID=1000
|eval ForceUpdateFields="Severity,Type,Summary"
|eval Submitter="foo"
|eval LOB="IP"
|eval AlertGroup="VIP Member Dropped out"
|eval Agent="rdns"
Try this
*UPDATED*
index=networklogs host=foo10* OR host=foo11* AND "member" AND ("monitor status up" OR "monitor status down") | rex "monitor status\s+(?<state>up|down)" | transaction host startswith="monitor status down" endswith="monitor status up" maxspan=30s maxevents=2 keepevicted=t | where closed_txn=0 AND state="down"
How can I test this?
I tried changing the maxspan to 1s and set the timeframe to where we had false positives of 6s downtime but I still didn't get a result.
My bad, try updated query
would my test scenario be correct then? adjust the maxspan?
you could play around with maxspan, yes
I'm not having success with this. Can you breakdown what you suggested into what it is doing? I don't understand the field closed_txn=0
closed_txn=0
will show transactions that don't have 2 events (start and end).
Can you share your query?
I cannot edit the original post or submit any further replies so, here is the second search that should generate the alert if no results are found:
index=networklogs host=foo10* OR host=foo11* AND ("member" AND "monitor status up")
Is it resolved, I am also trying for same kind of query , when a state changes from CLOSED to OPEN , i am logging these message like (state changes from closed to open, state changes from open to close). Now I want trigger an alert when after changing the state from closed to open if it does not change back to closed in 10 minute.