Splunk Enterprise

How to check Up/Down hosts with Splunk and/or the Telegraf Agent?

Mark90
Explorer

We are trying to verify if a server is up or down via different ways, but none seem to be working for us.

We are monitoring our infrastructure via the Telegraf Agent, and as far as we know, Telegraf does not have an embedded "up" metric for its agent, so we were running a random metric query, and filling nulls wherever the query didn't find info:

Mark90_0-1623096134768.png

We were testing this live turning off the agent of a server and the server would just dissapear from the list, instead of showing down.


So we tried to run a pure splunk-based query to see if it would solve the reliability problem we were having with Telegraf:


| tstats latest(_time) AS latest where index=* earliest=-24h BY host
| eval host=lower(host)
| eval recent=if(latest > relative_time(now(),"-5m"),1,0), realLatest=strftime(latest, "%c")
| where recent=0
| table host latest recent realLatest

We understand that "recent=0" means that, that specific host, is not sending any event, therefore can be considered from our end as "down". Problem is that for most of the "recent=0" servers,  they were still showing up in telegraf sending metrics normally.

Is there any reliable way to monitor up/down hosts in Splunk?

 

| tstats latest(_time) AS latest where index=* earliest=-24h BY host
| eval host=lower(host)
| eval recent=if(latest > relative_time(now(),"-5m"),1,0), realLatest=strftime(latest, "%c")
| where recent=0
| table host latest recent realLatest

 

Labels (1)
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...