We want an alert anytime a forwarder stops sending data or the service stops. I've installed the Splunk Deployment Monitor app and tested this by stopping the service on a forwarder and for about 15 minutes the app showed the forwarder as active then it disappeared from the list. My ideal situation would be for it to show as Inactive and have it send an alert that a forwarder is down.
After researching I saw that if we upgrade to 6.3 then that will have a native app built in which will let us know when a forwarder is down. So my question is, should we continue using our 6.2 version and put more time into configuring the Splunk Deployment Monitor app or should we upgrade to 6.3 and use that native app? Is the native app is 6.3 more intuitive then the Splunk Deployment Monitor app?
We're currently running version 6.2
Edit: That native app I was referring to was Distributed Management Console (DMC).. I see now that it's installed in my 6.2 version already. Can anyone advise how I can set up alerting inside this app to tell us if a UF is down?
Thanks for helping everyone! Unfortunately I'm still running version 6.2 so I can't use the DMC until I upgrade to 6.3.. But I did come up with a temporary solution to monitor my forwarders until I upgrade Splunk..
I used the search below to list all my forwarders, I then saved the search as an alert and set the timespan to check every 5 minutes and set the "Trigger if number of results != 249".. I tested this by stopping one of the forwarders and sure enough it sent an alert saying a forwarder is down.
The only drawback to this is if we add another forwarder I then have to update the hard coded value in the alert settings, but this is only temporary so it will work for now.
index=_internal sourcetype=splunkd destPort!="-"
| stats sparkline count by hostname, sourceHost, host, destPort, version
| rename destPort as "Destination Port"
| rename host as "Indexer"
| rename sourceHost as "Forwarder IP"
| rename version as "Splunk Forwarder Version"
| rename hostname as "Forwarder Host Name"
| rename sparkline as "Traffic Frequency"
| sort - count
Thanks for helping everyone! Unfortunately I'm still running version 6.2 so I can't use the DMC until I upgrade to 6.3.. But I did come up with a temporary solution to monitor my forwarders until I upgrade Splunk..
I used the search below to list all my forwarders, I then saved the search as an alert and set the timespan to check every 5 minutes and set the "Trigger if number of results != 249".. I tested this by stopping one of the forwarders and sure enough it sent an alert saying a forwarder is down.
The only drawback to this is if we add another forwarder I then have to update the hard coded value in the alert settings, but this is only temporary so it will work for now.
index=_internal sourcetype=splunkd destPort!="-"
| stats sparkline count by hostname, sourceHost, host, destPort, version
| rename destPort as "Destination Port"
| rename host as "Indexer"
| rename sourceHost as "Forwarder IP"
| rename version as "Splunk Forwarder Version"
| rename hostname as "Forwarder Host Name"
| rename sparkline as "Traffic Frequency"
| sort - count
If you want to go this direction, consider creating a lookup table that contains a list of the forwarders.
Then you can do a search like this
index=_internal sourcetype=splunkd destPort!="-"
| stats count by hostname, sourceHost, host, destPort, version
| append [ inputlookup host_lookup | eval count=0 | rename host as hostname ]
| stats sum(count) as count by hostname, sourceHost, host, destPort, version
| rename destPort as "Destination Port"
| rename host as "Indexer"
| rename sourceHost as "Forwarder IP"
| rename version as "Splunk Forwarder Version"
| rename hostname as "Forwarder Host Name"
| rename sparkline as "Traffic Frequency"
| sort - count
This adds in a list of all the forwarder with a count of zero. Combining that with the stats results means that any hostname with a count of zero is missing. You lose the sparkline and you have to change your alert. Or you could just do this for the alert, and something else for this report:
index=_internal sourcetype=splunkd destPort!="-"
| stats count by hostname, sourceHost, host, destPort, version
| append [ inputlookup host_lookup | eval count=0 | rename host as hostname ]
| stats sum(count) as count by hostname, sourceHost, host, destPort, version
| where count < 1
With an alert condition of "One or more results".
Assumes a csv file has been uploaded as a lookup file and that a lookup has been created. If you add more forwarders, you will need to re-upload the csv file, but you won't have to edit your searches. The csv file could look like this
host,owner,email
forwarder1,skoelpin,skoelpin@gmail.com
etc. You actually only need the "host" field, but I find that a lookup table like this can have many applications...
The feature to monitor forwarders was added in 6.3
What is your phoneHomeInterval set to? you should set it to the value that you have set.
A simple Splunk search on metadata should help.
| metadata type=hosts index=os OR index=_internal | eval age = now() - recentTime | eval status= case(age < 1800,"Running",age > 1800,"DOWN") | convert ctime(recentTime) AS LastActiveOn
| eval age=tostring(age,"duration") | eval host = upper(host)
| table host age LastActiveOn status
| rename host as "Forwarder Name", age as "Last Heartbeat(min)",LastActiveOn as "Last Active On",status as Status
| where Status= "DOWN"
1800 depends on your interval.
If you want to "roll your own search," you might take a look at this answer:
how to provide a status of a forwarder
You should really be basing your alert on whether the forwarder connected using the _internal index on the indexers. I am not sure that the metadata command really gives sufficient information.
This is good but it's only returning half my forwarders and showing them all as DOWN. Any idea why?
What is your phoneHomeInterval set to? you should set it to the value that you have set.
I recommend the DMC, which is part of 6.3; it has a component for monitoring forwarders. And you can set an alert when a forwarder goes "missing" - when it hasn't sent any data in 15 minutes. That is just a checkbox! You can also tune the alert to run more or less often.
The DMC is gradually doing all the things that the Deployment Monitor app and the SOS (Splunk-on-Splunk) app used to provide - plus more. If it will do what you want, that's really the way to go.
Would I be able to set up forwarder management on version 6.2?
Thanks for the response! I found out about the DMC just about the same time as you posted this, I also found this doc which outlines forwarder management. I'll let you know my results once I set it up!
http://docs.splunk.com/Documentation/Splunk/6.3.0/DMC/Configureforwardermonitoring