We have the following health check about the server being up or down -
| tstats latest(_time) as latest where index=os_solaris source="Unix:Uptime" by host
| convert timeformat=" %b %d, %Y %H:%M:%S" ctime(latest) AS Last_Log
| where latest < relative_time(now(), "-10m")
| table Last_Log, host
What would be a better way, that won't produce a false alert on 100 servers, as it happened today?
This has been solved many times including:
Meta Woot!: https://splunkbase.splunk.com/app/2949/
TrackMe: https://splunkbase.splunk.com/app/4621/,
Broken Hosts App for Splunk: https://splunkbase.splunk.com/app/3247/
Alerts for Splunk Admins ("ForwarderLevel" alerts): https://splunkbase.splunk.com/app/3796/
Splunk Security Essentials(https://docs.splunksecurityessentials.com/features/sse_data_availability/): https://splunkbase.splunk.com/app/3435/
Monitoring Console: https://docs.splunk.com/Documentation/Splunk/latest/DMC/Configureforwardermonitoring
Deployment Server: https://docs.splunk.com/Documentation/DepMon/latest/DeployDepMon/Troubleshootyourdeployment#Forwarde...
In the simplest form check server is up or not speaks about it.