Solved: How do I tell if a forwarder is down?

Alan_Bradley · ‎03-19-2010

How can you differentiate between a forwarder being down and a forwarder not having any data to send ? i.e is there a heartbeat that i can tap into?

matt · ‎03-19-2010

If you have deployed a number of Splunk forwarders and they are all pushing data to Splunk, you might not notice if one of them goes out of service, because the other forwarders are still pushing data to Splunk. You can run the following search to detect forwarders that have been up in the last 24 hours but not in the last 2 minutes. It uses the forwarder heartbeat, which is a feature of Splunk versions 3.2 and later.

index=_internal sourcetype="fwd-hb" starthoursago=24 | dedup host | eval age = strftime("%s","now") - _time | search age > 120 age < 86000

You can set this search up as an alert every several minutes so that Splunk will let you know if any of your active forwarders have not responded in the last 2 minutes.

If you're running a version of Splunk that is later than 3.3, the heartbeat message is not longer sent. Use the following search instead:

The following search works in 3.4.5 and finds all hosts who haven't sent a message in the last 24 hours

and in 4.0:

Another 4.0 variant

| metadata type=hosts | sort recentTime desc | convert ctime(recentTime) as Recent_Time

Caveat: Many of these methods do not account for decommissioned hosts, which you are bound to have after a length of time. These hosts will also show up in the search results, as they also fit the criteria. Incorporating a host tag ('decommisioned', etc) into this search may help with this, but requires you to tag known hosts that are no longer valid.

View solution in original post

rameshyedurla · ‎01-22-2016

try this
| metadata type=hosts

| eval lastHour=relative_time(now(),"-1h@h")
| eval yesterday=relative_time(now(), "-1d@d")
| where ( recentTime>yesterday AND recentTime

joseluisrespeto · ‎12-18-2014

Good morning,

How can i check that the forwarders are sending the logs correctly? I have the following error in my logs:

"eventType=connect_fail" in metrics.log

metrics.log:12-17-2014 09:38:48.529 +0100 INFO StatusMgr - destHost=10.26.XX.XX, destIp=10.26.XX.XX, destPort=9997, eventType=connect_fail, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor

This event produce that the los are not been sending correctly, them i need know if any option in the program execution can check if splunk is sending or not the data to the server.

And, can i resolve this issue in the configuration with some parameter? This issue only appear in determinate times isn't fixed.

Thanks and regards.

marcoscala · ‎08-25-2010

Hi All!

I just made another test and changed a bit the logic.

I was looking for Forwarders not being sending data for more than let's say 2 minutes. Here's my latest version:

If you prefere, you can work on seconds of course instead of minutes 🙂

Marco

marcoscala · ‎07-15-2010

I'm going to test this later on 4.1.3 and let you know. I need to provide our Customer a dashboard to monitor all the remote forwarder at a glance.

Marco

marcoscala · ‎07-15-2010

Matt, I'v tried with Splunk 4.1.3 the following search:

and I got the following error:

"Error in 'eval' command: Typechecking failed. '-' only takes numbers."

Marco

matt · ‎03-19-2010

If you have deployed a number of Splunk forwarders and they are all pushing data to Splunk, you might not notice if one of them goes out of service, because the other forwarders are still pushing data to Splunk. You can run the following search to detect forwarders that have been up in the last 24 hours but not in the last 2 minutes. It uses the forwarder heartbeat, which is a feature of Splunk versions 3.2 and later.

index=_internal sourcetype="fwd-hb" starthoursago=24 | dedup host | eval age = strftime("%s","now") - _time | search age > 120 age < 86000

You can set this search up as an alert every several minutes so that Splunk will let you know if any of your active forwarders have not responded in the last 2 minutes.

If you're running a version of Splunk that is later than 3.3, the heartbeat message is not longer sent. Use the following search instead:

The following search works in 3.4.5 and finds all hosts who haven't sent a message in the last 24 hours

and in 4.0:

Another 4.0 variant

| metadata type=hosts | sort recentTime desc | convert ctime(recentTime) as Recent_Time

Caveat: Many of these methods do not account for decommissioned hosts, which you are bound to have after a length of time. These hosts will also show up in the search results, as they also fit the criteria. Incorporating a host tag ('decommisioned', etc) into this search may help with this, but requires you to tag known hosts that are no longer valid.

How do I tell if a forwarder is down?

October Community Champions: A Shoutout to Our Contributors!

Community Content Calendar, November Edition

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!

Are you a member of the Splunk Community?

How do I tell if a forwarder is down?

October Community Champions: A Shoutout to Our Contributors!

Community Content Calendar, November Edition

Stay Connected: Your Guide to November Tech Talks, Office Hours, and Webinars!