Getting Data In

Monitoring Host/client connectivity

PDXKiel
Path Finder

Hello, I'd imagine someone's already had this issue and solved it but I can't find it in Answers and hope someone can help.

We have a large set of clients we connect with and want to monitor them if they go down. We've created this search so far to accomplish this:

sourcetype=tandem* "Host is OFFLINE" OR "Host is ONLINE"
| rex field=Text "Host is (?P\w+)"
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="OFFLINE"
| sort - _time

I needed to expand what messages constitute a host being up or down so I setup 2 event types (listed below) and I've found that if a host has been down longer than the search time range (default is 24hr) it won't get picked up.

HostOffline: (sourcetype=tandem* "Host is OFFLINE" OR "Host COMMUNICATIONS FAILURE")
HostOnline: (sourcetype=tandem* "Host is ONLINE" OR "Host COMMUNICATIONS resumed")

But when I substitute these event types into my search it returns a lot of false positives. Is there a better way to go about this? Any information is greatly appreciated, thanks!

0 Karma
1 Solution

PDXKiel
Path Finder

Thanks everyone for your help and suggestions. After more research I ended up writing this search that solved it:

index=cardservices sourcetype=tandem:epoc eventtype=HostOnline OR eventtype=HostOffline HostLogo=*
| eval Status=case(eventtype="HostOnline", "Online", eventtype="HostOffline", "Offline")
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="Offline"
| sort - _time

View solution in original post

0 Karma

PDXKiel
Path Finder

Thanks everyone for your help and suggestions. After more research I ended up writing this search that solved it:

index=cardservices sourcetype=tandem:epoc eventtype=HostOnline OR eventtype=HostOffline HostLogo=*
| eval Status=case(eventtype="HostOnline", "Online", eventtype="HostOffline", "Offline")
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="Offline"
| sort - _time

0 Karma

amitm05
Builder

@PDXKiel

You might want to go with dedup command, which keeps the most latest unique value as per your fields.
So, something like

<Your Search>| dedup Status, HostLogo, entity, Sys | table <your fields>

Do the tweaking as required as per your search.
Thanks

0 Karma

amitm05
Builder

Let me know the status please. Accept if it helped Or if you are looking for anything more to discuss

0 Karma

adonio
Ultra Champion

not really sure what the requirement here, can you elaborate?
why not search only for the failures and present the table you want with stats?
no need for the where clause

0 Karma

PDXKiel
Path Finder

Hi adonio, thanks for the quick response. I want to create a dashboard panel that refreshes every minute or so and gives us a list of clients that haven't come back up no matter how long they've been down. There are 2 conditions in the logs that tell us when they go down and when they come back up (see the eventtypes I created). They can bounce several times and still be down if we don't see one of the messages indicating they've come back up so the latest command was to get all the latest status then show the ones where that status was "OFFLINE".

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...