I'm attempting to suppress an alert if a follow up event (condition) is received within 60 seconds of the initial event (condition) from the same host. This is a network switch alerting for BFD neighbor down event. I want to suppress the alert if a BFD neighbor up event is received within 60 seconds.
This is the event data received:
Initial BFD Down:
2025-05-07T07:20:40.482713-04:00 "switch_name" : 2025 May 7 07:20:40 EDT: %BFD-5-SESSION_STATE_DOWN: BFD session 1124073489 to neighbor "IP Address" on interface Vlan43 has gone down. Reason: Administratively Down.
host = "switch_name"
Second event to nullify the alert:
2025-05-07T07:20:41.482771-04:00 "switch_name" : 2025 May 7 07:20:41 EDT: %BFD-5-SESSION_STATE_UP: BFD session 1124073489 to neighbor "IP Address" on interface Vlan43 is up.
host = "switch_name"
Search for up/down events and take the most recent for each host (switch). Discard all of the up events and anything newer than 60 seconds. The remainder will be down events at least a minute old without a following up event.
index=foo ("SESSION_STATE_DOWN" OR "SESSION_STATE_UP")
| dedup host
| where match(_raw, "SESSION_STATE_DOWN") AND _time<relative_time(now(), "-60s")
Thanks for the reply!
If I break down this query to just:
index=main ("SESSION_STATE_DOWN" OR "SESSION_STATE_UP") | dedup host
The only results are up sessions, the "dedup host" removes all of the down sessions. as they're from the same host
Does the remaining
| where match(_raw, "SESSION_STATE_DOWN") AND _time<relative_time(now(), "-60s")
only query the results from the | dedup host or the total query? If yes, it will never find a "SESSION_STATE_DOWN", they are filtered out
Each command works only with the results from the previous command.
If the most recent event from each host is SESSION_STATE_UP then your job is done since there are no down hosts. The where command will find no SESSION_STATE_DOWN events so there are none to display or alert on. This is the desired state (isn't it?).
OTOH, if a host currently is down then dedup host will return SESSION_STATE_DOWN for that host and the where command will decide if the host has been down long enough to worry about.
Understood. To test this, I'll actually need to down an interface for at least 60 seconds to see the Down result. I'll need to get a network engineer involved to test. I will get back ASAP.
Hi @dflynn235
Were you able to try the search I provided?
Im happy to help work through this if theres an issue with this approach.
@livehybrid wrote:Hi @dflynn235
Does the following do what you are looking for?
| eval status=case(searchmatch("has gone down"),"Down",searchmatch("is up"),"Up",true(),"Unknown")
| rex "on interface (?<iface>[a-zA-Z0-9]+)"
| stats range(_time) as downTime latest(status) as latestStatus by iface
| where downTime>60
Here is a working example with sample data, just add the | where to limit as required.
| makeresults count=1
| eval _raw="2025-05-07T07:20:40.482713-04:00 \"switch_name\" : 2025 May 7 07:20:40 EDT: %BFD-5-SESSION_STATE_DOWN: BFD session 1124073489 to neighbor \"IP Address\" on interface Vlan43 has gone down. Reason: Administratively Down."
| eval host="switch_name"
| append [| makeresults count=1
| eval _raw="2025-05-07T07:20:41.482771-04:00 \"switch_name\" : 2025 May 7 07:20:41 EDT: %BFD-5-SESSION_STATE_UP: BFD session 1124073489 to neighbor \"IP Address\" on interface Vlan43 is up."
| eval host="switch_name"]
| rex "^(?<timeStr>[^\s]+)"
| eval _time=strptime(timeStr,"%Y-%m-%dT%H:%M:%S.%6N%Z")
| eval status=case(searchmatch("has gone down"),"Down",searchmatch("is up"),"Up",true(),"Unknown")
| rex "on interface (?<iface>[a-zA-Z0-9]+)"
| stats range(_time) as downTime latest(status) as latestStatus by iface
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
Thanks for the reply!
I ran the query as is and received some odd results. I slightly modified it as shown:
index=main sourcetype="cisco:ios" ("SESSION_STATE_DOWN" OR "SESSION_STATE_UP") | eval status=case(searchmatch("%BFD-5-SESSION_STATE_DOWN"),"Down",searchmatch("%BFD-5-SESSION_STATE_UP"),"Up",true(),"Unknown")
| rex "on interface (?<iface>[a-zA-Z0-9]+)"
| stats range(_time) as downTime latest(status) as latestStatus by iface
| where downTime<60
This produced 2 results for the past 7 days:
Can this be run in realtime and alert be generated for a "LatestStatus" of Down?
Hi @dflynn235
Does the following do what you are looking for?
| eval status=case(searchmatch("has gone down"),"Down",searchmatch("is up"),"Up",true(),"Unknown")
| rex "on interface (?<iface>[a-zA-Z0-9]+)"
| stats range(_time) as downTime latest(status) as latestStatus by iface
| where downTime>60
Here is a working example with sample data, just add the | where to limit as required.
| makeresults count=1
| eval _raw="2025-05-07T07:20:40.482713-04:00 \"switch_name\" : 2025 May 7 07:20:40 EDT: %BFD-5-SESSION_STATE_DOWN: BFD session 1124073489 to neighbor \"IP Address\" on interface Vlan43 has gone down. Reason: Administratively Down."
| eval host="switch_name"
| append [| makeresults count=1
| eval _raw="2025-05-07T07:20:41.482771-04:00 \"switch_name\" : 2025 May 7 07:20:41 EDT: %BFD-5-SESSION_STATE_UP: BFD session 1124073489 to neighbor \"IP Address\" on interface Vlan43 is up."
| eval host="switch_name"]
| rex "^(?<timeStr>[^\s]+)"
| eval _time=strptime(timeStr,"%Y-%m-%dT%H:%M:%S.%6N%Z")
| eval status=case(searchmatch("has gone down"),"Down",searchmatch("is up"),"Up",true(),"Unknown")
| rex "on interface (?<iface>[a-zA-Z0-9]+)"
| stats range(_time) as downTime latest(status) as latestStatus by iface
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing