Splunk Search

How can I discover fiber cuts and power supply failures?

testman
Engager

Hello,

I am currently testing Splunk for our Cisco backbone network and I would like to filter out two scenarios.

1.) When a power outage occurs, several power supply failures occur at the same time

2.) When there is a fiber cut, multiple pseudowires go down at the same time

How can I filter these reasonably so that such a failure is detected immediately?

Thanks a lot

Labels (2)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

Now we're getting somewhere 🙂

If I remember correctly, Cisco logs have this pretty unique identifier of a given event type. In your case it would be %L2-L2VPN-PW-3-UPDOWN and %IOSXE_PEM-3-PEMFAIL respectively. Since you want to catch only pseudowires going down I think you'd want to search for

L2-L2VPN-PW-3-UPDOWN "changed to: Down"

in the first case.

So that's the part of selecting the events. Now if we want to find if the same thing happened over several devices, we have to aggregate it with time window using streamstats

| streamstats time_window=5s dc(host) as failedhosts

Now if you want only those events when the number of simultaneously failed hosts is high enough, just filter the results

| where failedhosts>5

An that's it. You have two searches

L2-L2VPN-PW-3-UPDOWN "changed to: Down"
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

and

IOSXE_PEM-3-PEMFAIL
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

 

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

Depends on what events you're getting from your equipment. Remember that in general we're splunkers here, not necessarily specialists on Cisco, Juniper, Windows Server or whatever solution you can imagine (although there are many people here and probably some have had some experience with various environments).

So it's a question back to you - how do those situations you outlined present themselves in the logs. If you can tell us that - we might be able to tell you how to make a search that will find that.

0 Karma

testman
Engager

Thank you for your reply. I understand and respect that you are splunkers and no specialists in Cisco.

When a Pseudowire goes down the log entry looks like this and occurs around the same time (+/- 5sec) at some devices (maybe count > 5)

Jul 25 22:43:32 <IP-Addr> 115292: RP/0/RSP0/CPU0:Jul 25 22:44:41.504 MEST: l2vpn_mgr[1181]: %L2-L2VPN-PW-3-UPDOWN : Pseudowire with address <IP-Addr>, id 12345, state is changed to: Down

When power supply goes down the logs look like:

Jun 27 13:24:08 <IP-Address> 349: Jun 27 13:24:02.461 MEST: %IOSXE_PEM-3-PEMFAIL: The PEM in slot P0 is switched off or encountering a failure condition

or .....%IOSXE_RP_ALARM-2-PEM: asserted CRITICAL Power Supply Module 0: Power Supply Failure

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Now we're getting somewhere 🙂

If I remember correctly, Cisco logs have this pretty unique identifier of a given event type. In your case it would be %L2-L2VPN-PW-3-UPDOWN and %IOSXE_PEM-3-PEMFAIL respectively. Since you want to catch only pseudowires going down I think you'd want to search for

L2-L2VPN-PW-3-UPDOWN "changed to: Down"

in the first case.

So that's the part of selecting the events. Now if we want to find if the same thing happened over several devices, we have to aggregate it with time window using streamstats

| streamstats time_window=5s dc(host) as failedhosts

Now if you want only those events when the number of simultaneously failed hosts is high enough, just filter the results

| where failedhosts>5

An that's it. You have two searches

L2-L2VPN-PW-3-UPDOWN "changed to: Down"
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

and

IOSXE_PEM-3-PEMFAIL
| streamstats time_window=5s dc(host) as failedhosts
| where failedhosts>5

 

Get Updates on the Splunk Community!

New This Month in Splunk Observability Cloud - Metrics Usage Analytics, Enhanced K8s ...

The latest enhancements across the Splunk Observability portfolio deliver greater flexibility, better data and ...

Alerting Best Practices: How to Create Good Detectors

At their best, detectors and the alerts they trigger notify teams when applications aren’t performing as ...

Discover Powerful New Features in Splunk Cloud Platform: Enhanced Analytics, ...

Hey Splunky people! We are excited to share the latest updates in Splunk Cloud Platform 9.3.2408. In this ...