Reporting

Alert if a forwarder service stops

skoelpin
SplunkTrust
SplunkTrust

We want an alert anytime a forwarder stops sending data or the service stops. I've installed the Splunk Deployment Monitor app and tested this by stopping the service on a forwarder and for about 15 minutes the app showed the forwarder as active then it disappeared from the list. My ideal situation would be for it to show as Inactive and have it send an alert that a forwarder is down.

After researching I saw that if we upgrade to 6.3 then that will have a native app built in which will let us know when a forwarder is down. So my question is, should we continue using our 6.2 version and put more time into configuring the Splunk Deployment Monitor app or should we upgrade to 6.3 and use that native app? Is the native app is 6.3 more intuitive then the Splunk Deployment Monitor app?

We're currently running version 6.2

Edit: That native app I was referring to was Distributed Management Console (DMC).. I see now that it's installed in my 6.2 version already. Can anyone advise how I can set up alerting inside this app to tell us if a UF is down?

1 Solution

skoelpin
SplunkTrust
SplunkTrust

Thanks for helping everyone! Unfortunately I'm still running version 6.2 so I can't use the DMC until I upgrade to 6.3.. But I did come up with a temporary solution to monitor my forwarders until I upgrade Splunk..

I used the search below to list all my forwarders, I then saved the search as an alert and set the timespan to check every 5 minutes and set the "Trigger if number of results != 249".. I tested this by stopping one of the forwarders and sure enough it sent an alert saying a forwarder is down.

The only drawback to this is if we add another forwarder I then have to update the hard coded value in the alert settings, but this is only temporary so it will work for now.

index=_internal sourcetype=splunkd destPort!="-"
| stats sparkline count by hostname, sourceHost, host, destPort, version 
| rename destPort as "Destination Port" 
| rename host as "Indexer" 
| rename sourceHost as "Forwarder IP" 
| rename version as "Splunk Forwarder Version" 
| rename hostname as "Forwarder Host Name" 
| rename sparkline as "Traffic Frequency" 
| sort - count

View solution in original post

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Thanks for helping everyone! Unfortunately I'm still running version 6.2 so I can't use the DMC until I upgrade to 6.3.. But I did come up with a temporary solution to monitor my forwarders until I upgrade Splunk..

I used the search below to list all my forwarders, I then saved the search as an alert and set the timespan to check every 5 minutes and set the "Trigger if number of results != 249".. I tested this by stopping one of the forwarders and sure enough it sent an alert saying a forwarder is down.

The only drawback to this is if we add another forwarder I then have to update the hard coded value in the alert settings, but this is only temporary so it will work for now.

index=_internal sourcetype=splunkd destPort!="-"
| stats sparkline count by hostname, sourceHost, host, destPort, version 
| rename destPort as "Destination Port" 
| rename host as "Indexer" 
| rename sourceHost as "Forwarder IP" 
| rename version as "Splunk Forwarder Version" 
| rename hostname as "Forwarder Host Name" 
| rename sparkline as "Traffic Frequency" 
| sort - count
0 Karma

lguinn2
Legend

If you want to go this direction, consider creating a lookup table that contains a list of the forwarders.

Then you can do a search like this

 index=_internal sourcetype=splunkd destPort!="-"
 | stats count by hostname, sourceHost, host, destPort, version 
 | append [ inputlookup host_lookup | eval count=0 | rename host as hostname ]
 | stats sum(count) as count by hostname, sourceHost, host, destPort, version 
 | rename destPort as "Destination Port" 
 | rename host as "Indexer" 
 | rename sourceHost as "Forwarder IP" 
 | rename version as "Splunk Forwarder Version" 
 | rename hostname as "Forwarder Host Name" 
 | rename sparkline as "Traffic Frequency" 
 | sort - count

This adds in a list of all the forwarder with a count of zero. Combining that with the stats results means that any hostname with a count of zero is missing. You lose the sparkline and you have to change your alert. Or you could just do this for the alert, and something else for this report:

 index=_internal sourcetype=splunkd destPort!="-"
 | stats count by hostname, sourceHost, host, destPort, version 
 | append [ inputlookup host_lookup | eval count=0 | rename host as hostname ]
 | stats sum(count) as count by hostname, sourceHost, host, destPort, version 
 | where count < 1

With an alert condition of "One or more results".

Assumes a csv file has been uploaded as a lookup file and that a lookup has been created. If you add more forwarders, you will need to re-upload the csv file, but you won't have to edit your searches. The csv file could look like this

host,owner,email
forwarder1,skoelpin,skoelpin@gmail.com

etc. You actually only need the "host" field, but I find that a lookup table like this can have many applications...

lguinn2
Legend

The feature to monitor forwarders was added in 6.3

0 Karma

menonmanish
Path Finder

What is your phoneHomeInterval set to? you should set it to the value that you have set.

0 Karma

menonmanish
Path Finder

A simple Splunk search on metadata should help.
| metadata type=hosts index=os OR index=_internal | eval age = now() - recentTime | eval status= case(age < 1800,"Running",age > 1800,"DOWN") | convert ctime(recentTime) AS LastActiveOn
| eval age=tostring(age,"duration") | eval host = upper(host)
| table host age LastActiveOn status
| rename host as "Forwarder Name", age as "Last Heartbeat(min)",LastActiveOn as "Last Active On",status as Status
| where Status= "DOWN"

1800 depends on your interval.

lguinn2
Legend

If you want to "roll your own search," you might take a look at this answer:

how to provide a status of a forwarder

You should really be basing your alert on whether the forwarder connected using the _internal index on the indexers. I am not sure that the metadata command really gives sufficient information.

0 Karma

skoelpin
SplunkTrust
SplunkTrust

This is good but it's only returning half my forwarders and showing them all as DOWN. Any idea why?

0 Karma

menonmanish
Path Finder

What is your phoneHomeInterval set to? you should set it to the value that you have set.

0 Karma

lguinn2
Legend

I recommend the DMC, which is part of 6.3; it has a component for monitoring forwarders. And you can set an alert when a forwarder goes "missing" - when it hasn't sent any data in 15 minutes. That is just a checkbox! You can also tune the alert to run more or less often.

The DMC is gradually doing all the things that the Deployment Monitor app and the SOS (Splunk-on-Splunk) app used to provide - plus more. If it will do what you want, that's really the way to go.

skoelpin
SplunkTrust
SplunkTrust

Would I be able to set up forwarder management on version 6.2?

0 Karma

skoelpin
SplunkTrust
SplunkTrust

Thanks for the response! I found out about the DMC just about the same time as you posted this, I also found this doc which outlines forwarder management. I'll let you know my results once I set it up!

http://docs.splunk.com/Documentation/Splunk/6.3.0/DMC/Configureforwardermonitoring

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...