Solved: How can I discover fiber cuts and power supply fai...

testman · ‎07-27-2022

Hello,

I am currently testing Splunk for our Cisco backbone network and I would like to filter out two scenarios.

1.) When a power outage occurs, several power supply failures occur at the same time

2.) When there is a fiber cut, multiple pseudowires go down at the same time

How can I filter these reasonably so that such a failure is detected immediately?

Thanks a lot

PickleRick · ‎07-28-2022

Now we're getting somewhere 🙂

If I remember correctly, Cisco logs have this pretty unique identifier of a given event type. In your case it would be %L2-L2VPN-PW-3-UPDOWN and %IOSXE_PEM-3-PEMFAIL respectively. Since you want to catch only pseudowires going down I think you'd want to search for

L2-L2VPN-PW-3-UPDOWN "changed to: Down"

in the first case.

So that's the part of selecting the events. Now if we want to find if the same thing happened over several devices, we have to aggregate it with time window using streamstats

| streamstats time_window=5s dc(host) as failedhosts

Now if you want only those events when the number of simultaneously failed hosts is high enough, just filter the results

| where failedhosts>5

An that's it. You have two searches

L2-L2VPN-PW-3-UPDOWN "changed to: Down"
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

and

IOSXE_PEM-3-PEMFAIL
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

View solution in original post

PickleRick · ‎07-27-2022

Depends on what events you're getting from your equipment. Remember that in general we're splunkers here, not necessarily specialists on Cisco, Juniper, Windows Server or whatever solution you can imagine (although there are many people here and probably some have had some experience with various environments).

So it's a question back to you - how do those situations you outlined present themselves in the logs. If you can tell us that - we might be able to tell you how to make a search that will find that.

testman · ‎07-27-2022

Thank you for your reply. I understand and respect that you are splunkers and no specialists in Cisco.

When a Pseudowire goes down the log entry looks like this and occurs around the same time (+/- 5sec) at some devices (maybe count > 5)

Jul 25 22:43:32 <IP-Addr> 115292: RP/0/RSP0/CPU0:Jul 25 22:44:41.504 MEST: l2vpn_mgr[1181]: %L2-L2VPN-PW-3-UPDOWN : Pseudowire with address <IP-Addr>, id 12345, state is changed to: Down

When power supply goes down the logs look like:

Jun 27 13:24:08 <IP-Address> 349: Jun 27 13:24:02.461 MEST: %IOSXE_PEM-3-PEMFAIL: The PEM in slot P0 is switched off or encountering a failure condition

or .....%IOSXE_RP_ALARM-2-PEM: asserted CRITICAL Power Supply Module 0: Power Supply Failure

PickleRick · ‎07-28-2022

Now we're getting somewhere 🙂

If I remember correctly, Cisco logs have this pretty unique identifier of a given event type. In your case it would be %L2-L2VPN-PW-3-UPDOWN and %IOSXE_PEM-3-PEMFAIL respectively. Since you want to catch only pseudowires going down I think you'd want to search for

L2-L2VPN-PW-3-UPDOWN "changed to: Down"

in the first case.

So that's the part of selecting the events. Now if we want to find if the same thing happened over several devices, we have to aggregate it with time window using streamstats

| streamstats time_window=5s dc(host) as failedhosts

Now if you want only those events when the number of simultaneously failed hosts is high enough, just filter the results

| where failedhosts>5

An that's it. You have two searches

L2-L2VPN-PW-3-UPDOWN "changed to: Down"
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

and

IOSXE_PEM-3-PEMFAIL
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

How can I discover fiber cuts and power supply failures?

count

stats

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

Alerting Best Practices: How to Create Good Detectors

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...