Hello,
I am currently testing Splunk for our Cisco backbone network and I would like to filter out two scenarios.
1.) When a power outage occurs, several power supply failures occur at the same time
2.) When there is a fiber cut, multiple pseudowires go down at the same time
How can I filter these reasonably so that such a failure is detected immediately?
Thanks a lot
Now we're getting somewhere 🙂
If I remember correctly, Cisco logs have this pretty unique identifier of a given event type. In your case it would be %L2-L2VPN-PW-3-UPDOWN and %IOSXE_PEM-3-PEMFAIL respectively. Since you want to catch only pseudowires going down I think you'd want to search for
L2-L2VPN-PW-3-UPDOWN "changed to: Down"
in the first case.
So that's the part of selecting the events. Now if we want to find if the same thing happened over several devices, we have to aggregate it with time window using streamstats
| streamstats time_window=5s dc(host) as failedhosts
Now if you want only those events when the number of simultaneously failed hosts is high enough, just filter the results
| where failedhosts>5
An that's it. You have two searches
L2-L2VPN-PW-3-UPDOWN "changed to: Down"
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5
and
IOSXE_PEM-3-PEMFAIL
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5
Depends on what events you're getting from your equipment. Remember that in general we're splunkers here, not necessarily specialists on Cisco, Juniper, Windows Server or whatever solution you can imagine (although there are many people here and probably some have had some experience with various environments).
So it's a question back to you - how do those situations you outlined present themselves in the logs. If you can tell us that - we might be able to tell you how to make a search that will find that.
Thank you for your reply. I understand and respect that you are splunkers and no specialists in Cisco.
When a Pseudowire goes down the log entry looks like this and occurs around the same time (+/- 5sec) at some devices (maybe count > 5)
Jul 25 22:43:32 <IP-Addr> 115292: RP/0/RSP0/CPU0:Jul 25 22:44:41.504 MEST: l2vpn_mgr[1181]: %L2-L2VPN-PW-3-UPDOWN : Pseudowire with address <IP-Addr>, id 12345, state is changed to: Down
When power supply goes down the logs look like:
Jun 27 13:24:08 <IP-Address> 349: Jun 27 13:24:02.461 MEST: %IOSXE_PEM-3-PEMFAIL: The PEM in slot P0 is switched off or encountering a failure condition
or .....%IOSXE_RP_ALARM-2-PEM: asserted CRITICAL Power Supply Module 0: Power Supply Failure
Now we're getting somewhere 🙂
If I remember correctly, Cisco logs have this pretty unique identifier of a given event type. In your case it would be %L2-L2VPN-PW-3-UPDOWN and %IOSXE_PEM-3-PEMFAIL respectively. Since you want to catch only pseudowires going down I think you'd want to search for
L2-L2VPN-PW-3-UPDOWN "changed to: Down"
in the first case.
So that's the part of selecting the events. Now if we want to find if the same thing happened over several devices, we have to aggregate it with time window using streamstats
| streamstats time_window=5s dc(host) as failedhosts
Now if you want only those events when the number of simultaneously failed hosts is high enough, just filter the results
| where failedhosts>5
An that's it. You have two searches
L2-L2VPN-PW-3-UPDOWN "changed to: Down"
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5
and
IOSXE_PEM-3-PEMFAIL
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5