Getting Data In

Can you help me find the time between events?

mwk
Explorer

I've been trying things to figure this out for a few months now off and on. I get close but . . . and since my log output is different that anyone else's I've seen here, I'll put mine up here. I even had a Splunk instructor try this and he couldn't get it to do what I want it to do. I'm looking for: when did it happen, and how long was it down, and at some point I'd like to extend that out to: "hey this not only went down 6 hours ago, it's still down, I'll send an alert out"

Here's a chunk of the log output. I've regex'd every piece of it six ways to Sunday trying to find something to transaction or stats on, but I can't figure it out. Each line represents a hosts path on a SAN fabric so each line is actually unique but in a good SAN environment we have two of everything and there is overlap in things like switch port names, error messages etc.

So, for every host going down on a fabric message, there should be a message logged about that host on that fabric coming back up. And the same for the other fabric and no the fabrics don't know about each other (failure domain seperation etc.)

99% of the time, I don't care about the activity at all unless it looks like the host fell away on one fabric and not the other and then never came back. That's what I want to know about.

12/6/18
4:50:04.000 AM  
Dec  6 04:50:04 UNIQUE-SWITCH-NAME-1 : 2018 Dec  6 04:50:05 CST: %PORT-5-IF_UP: %$VSAN 11%$ Interface fc1/35 is up in mode F hostname-XXXXX-fabric1

12/6/18
4:50:02.000 AM  
Dec  6 04:50:02 UNIQUE-SWITCH-NAME-2: 2018 Dec  6 04:50:03 CST: %PORT-5-IF_UP: %$VSAN 12%$ Interface fc1/35 is up in mode F  hostname-XXXXX-fabric2 


12/6/18
4:50:04.000 AM  
Dec  6 04:50:04 UNIQUE-SWITCH-NAME-3 : 2018 Dec  6 04:50:05 CST: %PORT-5-IF_UP: %$VSAN 11%$ Interface fc1/35 is up in mode F hostname-ZZZZZ-fabric1

12/6/18
4:50:02.000 AM  
Dec  6 04:50:02 UNIQUE-SWITCH-NAME-4: 2018 Dec  6 04:50:03 CST: %PORT-5-IF_UP: %$VSAN 12%$ Interface fc1/35 is up in mode F  hostname-ZZZZZ-fabric2

12/6/18
4:47:14.000 AM  
Dec  6 04:47:14 UNIQUE-SWITCH-NAME-1 : 2018 Dec  6 04:47:15 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 12%$ Interface fc1/35 is down (Link failure loss of signal)  hostname-XXXXX-fabric1 

12/6/18
4:47:13.000 AM  
Dec  6 04:47:13 UNIQUE-SWITCH-NAME-2 : 2018 Dec  6 04:47:14 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 11%$ Interface fc1/35 is down (Link failure loss of signal)   hostname-XXXXX-fabric2 

12/6/18
4:47:13.000 AM  
Dec  6 04:47:14 UNIQUE-SWITCH-NAME-3 : 2018 Dec  6 04:47:13 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 12%$ Interface fc1/36 is down (Link failure loss of signal)  hostname-ZZZZZ-fabric1 

12/6/18
4:47:13.000 AM  
Dec  6 04:47:13 UNIQUE-SWITCH-NAME-4 : 2018 Dec  6 04:47:14 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 11%$ Interface fc1/36 is down (Link failure loss of signal)   hostname-ZZZZZZ-fabric2 
0 Karma

efavreau
Motivator

Can you show how you would manually pair the two events of "it went down" and "it came back up"?

###

If this reply helps you, an upvote would be appreciated.
0 Karma

mwk
Explorer

This would be one host going down on two different SAN fabrics.
12/6/18
4:47:14.000 AM

Dec 6 04:47:14 UNIQUE-SWITCH-NAME-1 : 2018 Dec 6 04:47:15 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 12%$ Interface fc1/35 is down (Link failure loss of signal) hostname-XXXXX-fabric2

12/6/18
4:47:13.000 AM

Dec 6 04:47:13 UNIQUE-SWITCH-NAME-2 : 2018 Dec 6 04:47:14 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 11%$ Interface fc1/35 is down (Link failure loss of signal) hostname-XXXXX-fabric1

Hopefully after a few minutes you'd get:
12/6/18
4:50:04.000 AM

Dec 6 04:50:04 UNIQUE-SWITCH-NAME-3 : 2018 Dec 6 04:50:05 CST: %PORT-5-IF_UP: %$VSAN 11%$ Interface fc1/35 is up in mode F hostname-ZZZZZ-fabric1

12/6/18
4:50:02.000 AM

Dec 6 04:50:02 UNIQUE-SWITCH-NAME-4: 2018 Dec 6 04:50:03 CST: %PORT-5-IF_UP: %$VSAN 12%$ Interface fc1/35 is up in mode F hostname-ZZZZZ-fabric2

Hostnames are unique by fabric because we add the last 2 octets of the hosts wwn to the hostname on that fabric - so myhost-departmental-abbrivation-89f5 would be on fabric 1 - myhost-departmental-abbrivation-ff87 would be on fabric 2. Now they can do go down and come up in a seemingly random order from splunks perspective because each SAN fabric is logically and physically unique and they don't have any concept of the others existence. So they come into splunk whenever they are logged and forwarded from each device, through rsyslog/splunkforwarder to the splunk indexer.

0 Karma

woodcock
Esteemed Legend

Your data appears to not cover complete state changes so it I am positive that I do not understand your use case. However this may help you sort it out:

|makeresults | eval raw="12/6/18 4:50:04.000 AM Dec  6 04:50:04 UNIQUE-SWITCH-NAME-1 : 2018 Dec  6 04:50:05 CST: %PORT-5-IF_UP: %$VSAN 11%$ Interface fc1/35 is up in mode F hostname-XXXXX-fabric1:::12/6/18 4:50:02.000 AM Dec  6 04:50:02 UNIQUE-SWITCH-NAME-2: 2018 Dec  6 04:50:03 CST: %PORT-5-IF_UP: %$VSAN 12%$ Interface fc1/35 is up in mode F  hostname-XXXXX-fabric2:::12/6/18 4:50:04.000 AM Dec  6 04:50:04 UNIQUE-SWITCH-NAME-3 : 2018 Dec  6 04:50:05 CST: %PORT-5-IF_UP: %$VSAN 11%$ Interface fc1/35 is up in mode F hostname-ZZZZZ-fabric1:::12/6/18 4:50:02.000 AM Dec  6 04:50:02 UNIQUE-SWITCH-NAME-4: 2018 Dec  6 04:50:03 CST: %PORT-5-IF_UP: %$VSAN 12%$ Interface fc1/35 is up in mode F  hostname-ZZZZZ-fabric2:::12/6/18 4:47:14.000 AM Dec  6 04:47:14 UNIQUE-SWITCH-NAME-1 : 2018 Dec  6 04:47:15 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 12%$ Interface fc1/35 is down (Link failure loss of signal)  hostname-XXXXX-fabric1:::12/6/18 4:47:13.000 AM Dec  6 04:47:13 UNIQUE-SWITCH-NAME-2 : 2018 Dec  6 04:47:14 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 11%$ Interface fc1/35 is down (Link failure loss of signal)   hostname-XXXXX-fabric2:::12/6/18 4:47:13.000 AM Dec  6 04:47:14 UNIQUE-SWITCH-NAME-3 : 2018 Dec  6 04:47:13 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 12%$ Interface fc1/36 is down (Link failure loss of signal)  hostname-ZZZZZ-fabric1:::12/6/18 4:47:13.000 AM Dec  6 04:47:13 UNIQUE-SWITCH-NAME-4 : 2018 Dec  6 04:47:14 CST: %PORT-5-IF_DOWN_LINK_FAILURE: %$VSAN 11%$ Interface fc1/36 is down (Link failure loss of signal)   hostname-ZZZZZZ-fabric2"
| makemv delim=":::" raw
| mvexpand raw
| rename raw AS _raw

| rename COMMENT "Everything above generates sample event data; everything below is part of your solution..."

| rex "^(?<time>.*?)\s+(?<host>UNIQ*\S+)\s+:.*?\s+Interface\s+(?<Interface>\S+)\s+is\s+(?<state>\S+)"
| eval _time = strptime(time, "%m/%d/%y %H:%M:%S")
| streamstats count(eval(state=="up")) AS HostPortStateSessionID BY host Interface
| stats first(state) AS current_state count range(_time) AS duration BY HostPortStateSessionID host Interface
0 Karma

cpetterborg
SplunkTrust
SplunkTrust

Be careful with the | rename COMMENT line above, as it can result in an error when you run the search (It does on 7.1.0, and probably others).

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...