Splunk Enterprise

How to check Up/Down hosts with Splunk and/or the Telegraf Agent?

Mark90
Explorer

We are trying to verify if a server is up or down via different ways, but none seem to be working for us.

We are monitoring our infrastructure via the Telegraf Agent, and as far as we know, Telegraf does not have an embedded "up" metric for its agent, so we were running a random metric query, and filling nulls wherever the query didn't find info:

Mark90_0-1623096134768.png

We were testing this live turning off the agent of a server and the server would just dissapear from the list, instead of showing down.


So we tried to run a pure splunk-based query to see if it would solve the reliability problem we were having with Telegraf:


| tstats latest(_time) AS latest where index=* earliest=-24h BY host
| eval host=lower(host)
| eval recent=if(latest > relative_time(now(),"-5m"),1,0), realLatest=strftime(latest, "%c")
| where recent=0
| table host latest recent realLatest

We understand that "recent=0" means that, that specific host, is not sending any event, therefore can be considered from our end as "down". Problem is that for most of the "recent=0" servers,  they were still showing up in telegraf sending metrics normally.

Is there any reliable way to monitor up/down hosts in Splunk?

 

| tstats latest(_time) AS latest where index=* earliest=-24h BY host
| eval host=lower(host)
| eval recent=if(latest > relative_time(now(),"-5m"),1,0), realLatest=strftime(latest, "%c")
| where recent=0
| table host latest recent realLatest

 

Labels (1)
Get Updates on the Splunk Community!

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

This challenge was first posted on Slack #puzzles channelFor BORE at .conf23, we had a puzzle question which ...

Shape the Future of Splunk: Join the Product Research Lab!

Join the Splunk Product Research Lab and connect with us in the Slack channel #product-research-lab to get ...

Auto-Injector for Everything Else: Making OpenTelemetry Truly Universal

You might have seen Splunk’s recent announcement about donating the OpenTelemetry Injector to the ...