Hi,
Suppose we have 10 heavy forwarders and want to get alerted if any one of them goes down.
How do we form an alert query.
index=_internal source=*splunkd.log*
may work for for a single server, how to extend the query to work for multiple servers.
If we use,
index=_internal source=*splunkd.log* | stats count by host
.. It may not work as host is down and won't be included in the results set.
Create a lookup with all the required hostnames and use it in the below query.
index=_internal host=*hfwd* | stats count by host
| append [ | inputlookup hfwd_hosts | table host ] | stats sum(count) as count by host | fillnull value=0 | where count =0
Create a lookup with all the required hostnames and use it in the below query.
index=_internal host=*hfwd* | stats count by host
| append [ | inputlookup hfwd_hosts | table host ] | stats sum(count) as count by host | fillnull value=0 | where count =0
If you use what you have and add one more line, you have an instant alert.
index=_internal source=*splunkd.log*
| stats dc(host) AS count values(host)
| where count < <known_number_of_hosts>
This needs one more stats or (just) dc(host) on existing one. Right now the count gives count of evwnts for host.
Correct, I really messed that up the first time. Corrected now.
Add your heavy forwarders as search peers to your Monitoring Console and enable the "DMC Alert - Search Peer Not Responding" alert.
It checks only the indexers and that too only the management port (8089)
@Yorokobi is right - if you add the HFs as search peers on your Monitoring console, the MC will contact them via port 8089 and you can use it's built-in alert to get a notification when one of them goes down. Actually works for all Splunk instances, be they indexers, search heads, HFs...
You can search metadata and alert if forwarders do not report for more than a certain threshold
| metadata type=hosts | eval age = now() - lastTime | search age > 300
However, in environments with large numbers of values for each category, the data might not be complete. This is intentional and allows the metadata command to operate within reasonable time and memory usage. ... from docs.
I don't think metadata can produce accurate results, I don't see it working
Also you can narrow down search | metadata type=hosts | search host= | eval age = now() - lastTime | search age > 300
OR
| metadata type=hosts | search host=testweb* | eval age = now() - lastTime | search age > 300