hello Team,
We have alert which finds the string 'is now DOWN' and then send the alert but we realized that after few seconds to few mins the node is coming up, so we want to get the alert if the node doesn't come up after 5 mins, can any one help us?
We want to get alert with below flow
search string 'is now DOWN'
condition: and check string '10.83.29.240 is now UP' about next 5 mins before sending alert
Logprints:
INFO [GossipTasks:1] 2020-06-30 01:42:40,115 Gossiper.java:1041 - InetAddress /10.83.29.240 is now DOWN
INFO [SharedPool-Worker-4] 2020-06-30 01:42:51,401 Gossiper.java:1026 - InetAddress /10.83.29.240 is now UP
As you see after 2 seconds that node came up.
Thanks
Chandra
Something like this should do it. It pulls all up and down events and extracts the state into a field. Then the most recent state for each server is saved and those which are down for more than 5 minutes are returned.
index=foo ("is now DOWN" OR "is now UP")
| rex "is now (?<state>\w+)"
| stats latest(state) as state by InetAddress
| where (state="DOWN" AND _time<=relative_time(now(), "-5m")
Something like this should do it. It pulls all up and down events and extracts the state into a field. Then the most recent state for each server is saved and those which are down for more than 5 minutes are returned.
index=foo ("is now DOWN" OR "is now UP")
| rex "is now (?<state>\w+)"
| stats latest(state) as state by InetAddress
| where (state="DOWN" AND _time<=relative_time(now(), "-5m")