Hello, I'd imagine someone's already had this issue and solved it but I can't find it in Answers and hope someone can help.
We have a large set of clients we connect with and want to monitor them if they go down. We've created this search so far to accomplish this:
sourcetype=tandem* "Host is OFFLINE" OR "Host is ONLINE"
| rex field=Text "Host is (?P\w+)"
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="OFFLINE"
| sort - _time
I needed to expand what messages constitute a host being up or down so I setup 2 event types (listed below) and I've found that if a host has been down longer than the search time range (default is 24hr) it won't get picked up.
HostOffline: (sourcetype=tandem* "Host is OFFLINE" OR "Host COMMUNICATIONS FAILURE")
HostOnline: (sourcetype=tandem* "Host is ONLINE" OR "Host COMMUNICATIONS resumed")
But when I substitute these event types into my search it returns a lot of false positives. Is there a better way to go about this? Any information is greatly appreciated, thanks!
Thanks everyone for your help and suggestions. After more research I ended up writing this search that solved it:
index=cardservices sourcetype=tandem:epoc eventtype=HostOnline OR eventtype=HostOffline HostLogo=*
| eval Status=case(eventtype="HostOnline", "Online", eventtype="HostOffline", "Offline")
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="Offline"
| sort - _time
Thanks everyone for your help and suggestions. After more research I ended up writing this search that solved it:
index=cardservices sourcetype=tandem:epoc eventtype=HostOnline OR eventtype=HostOffline HostLogo=*
| eval Status=case(eventtype="HostOnline", "Online", eventtype="HostOffline", "Offline")
| stats latest(Status) as Status, latest(_time) as _time by HostLogo, entity, Sys
| where Status="Offline"
| sort - _time
@PDXKiel
You might want to go with dedup command, which keeps the most latest unique value as per your fields.
So, something like
<Your Search>| dedup Status, HostLogo, entity, Sys | table <your fields>
Do the tweaking as required as per your search.
Thanks
Let me know the status please. Accept if it helped Or if you are looking for anything more to discuss
not really sure what the requirement here, can you elaborate?
why not search only for the failures and present the table you want with stats
?
no need for the where
clause
Hi adonio, thanks for the quick response. I want to create a dashboard panel that refreshes every minute or so and gives us a list of clients that haven't come back up no matter how long they've been down. There are 2 conditions in the logs that tell us when they go down and when they come back up (see the eventtypes I created). They can bounce several times and still be down if we don't see one of the messages indicating they've come back up so the latest command was to get all the latest status then show the ones where that status was "OFFLINE".