Getting Data In

Monitoring Host/client connectivity

PDXKiel
Path Finder

Hello, I'd imagine someone's already had this issue and solved it but I can't find it in Answers and hope someone can help.

We have a large set of clients we connect with and want to monitor them if they go down. We've created this search so far to accomplish this:

sourcetype=tandem* "Host is OFFLINE" OR "Host is ONLINE"
| rex field=Text "Host is (?P\w+)"
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="OFFLINE"
| sort - _time

I needed to expand what messages constitute a host being up or down so I setup 2 event types (listed below) and I've found that if a host has been down longer than the search time range (default is 24hr) it won't get picked up.

HostOffline: (sourcetype=tandem* "Host is OFFLINE" OR "Host COMMUNICATIONS FAILURE")
HostOnline: (sourcetype=tandem* "Host is ONLINE" OR "Host COMMUNICATIONS resumed")

But when I substitute these event types into my search it returns a lot of false positives. Is there a better way to go about this? Any information is greatly appreciated, thanks!

0 Karma
1 Solution

PDXKiel
Path Finder

Thanks everyone for your help and suggestions. After more research I ended up writing this search that solved it:

index=cardservices sourcetype=tandem:epoc eventtype=HostOnline OR eventtype=HostOffline HostLogo=*
| eval Status=case(eventtype="HostOnline", "Online", eventtype="HostOffline", "Offline")
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="Offline"
| sort - _time

View solution in original post

0 Karma

PDXKiel
Path Finder

Thanks everyone for your help and suggestions. After more research I ended up writing this search that solved it:

index=cardservices sourcetype=tandem:epoc eventtype=HostOnline OR eventtype=HostOffline HostLogo=*
| eval Status=case(eventtype="HostOnline", "Online", eventtype="HostOffline", "Offline")
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="Offline"
| sort - _time

0 Karma

amitm05
Builder

@PDXKiel

You might want to go with dedup command, which keeps the most latest unique value as per your fields.
So, something like

<Your Search>| dedup Status, HostLogo, entity, Sys | table <your fields>

Do the tweaking as required as per your search.
Thanks

0 Karma

amitm05
Builder

Let me know the status please. Accept if it helped Or if you are looking for anything more to discuss

0 Karma

adonio
Ultra Champion

not really sure what the requirement here, can you elaborate?
why not search only for the failures and present the table you want with stats?
no need for the where clause

0 Karma

PDXKiel
Path Finder

Hi adonio, thanks for the quick response. I want to create a dashboard panel that refreshes every minute or so and gives us a list of clients that haven't come back up no matter how long they've been down. There are 2 conditions in the logs that tell us when they go down and when they come back up (see the eventtypes I created). They can bounce several times and still be down if we don't see one of the messages indicating they've come back up so the latest command was to get all the latest status then show the ones where that status was "OFFLINE".

0 Karma
Get Updates on the Splunk Community!

Video | Welcome Back to Smartness, Pedro

Remember Splunk Community member, Pedro Borges? If you tuned into Episode 2 of our Smartness interview series, ...

Detector Best Practices: Static Thresholds

Introduction In observability monitoring, static thresholds are used to monitor fixed, known values within ...

Expert Tips from Splunk Education, Observability in Action, Plus More New Articles on ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...