Had a server that rebooted this weekend and the Universal Forwarder Service did not start. I want to get alerts about this, so after some digging - it looks like deployment monitor can do this.
I have set up a 15 min cron job to look for
dm_missing_forwarders
But now I am getting emails on a server that we shut down temporarily.
Can anyone tell me - what the dm_missing_forwarders
is looking for - it states "A missing forwarder has connected at some point in the past, but has not connected in the last 24 hours"
Does "at some point" mean any time at all? How do we exclude servers from this list?
If there are better suggestions on how to monitor services (within splunk) please let me know. I have some other monitoring tools, but don't want to load agents on my servers if I can use splunk.
Basically all I want is...
If a host has sent events in the last 24 hours, but you have nothing from it in the last hour, send an alert.
If (event count from * host in timespan 24h > 0) and (event count from * host in timespan 1h <0) then send alert
If my logic is correct, a search like this would be if the host has been active in the last 24 hours, but in the last hour you have received nothing - then send email.
Thanks for your help
Try scheduling this to run periodically (like once every hour):
index=* | dedup host | stats count first(_time) AS lastTime by host | eval age=now()-_lastTime | where age>43200 AND age<3600
Looks like I also get reports if no results are found... this could get annoying how I have it set up right now.
The scheduled report 'DM missing forwarders' has run.
No results found.
I am having the same issue with the Deployment Monitor app. Search results for forwarders not connecting in the last 24 hours does not appear to work. What version of Spunk are you running? I believe this may have started after upgrading to version 6.2.
Anyone? Please / thanks!