Getting Data In

How do I tell if a forwarder is down?

Path Finder

How can you differentiate between a forwarder being down and a forwarder not having any data to send ? i.e is there a heartbeat that i can tap into?

1 Solution

Splunk Employee
Splunk Employee

If you have deployed a number of Splunk forwarders and they are all pushing data to Splunk, you might not notice if one of them goes out of service, because the other forwarders are still pushing data to Splunk. You can run the following search to detect forwarders that have been up in the last 24 hours but not in the last 2 minutes. It uses the forwarder heartbeat, which is a feature of Splunk versions 3.2 and later.

index=_internal sourcetype="fwd-hb" starthoursago=24 | dedup host | eval age = strftime("%s","now") - _time | search age > 120 age < 86000

You can set this search up as an alert every several minutes so that Splunk will let you know if any of your active forwarders have not responded in the last 2 minutes.

If you're running a version of Splunk that is later than 3.3, the heartbeat message is not longer sent. Use the following search instead:

index=_internal "group=tcpin_connections" | stats max(_time) as latest by sourceHost | eventstats max(latest) as latest_all | eval lag = latest_all - latest | where lag > 120 | fields sourceHost lag

The following search works in 3.4.5 and finds all hosts who haven't sent a message in the last 24 hours

| metadata type=hosts | eval age = strftime("%s","now") - lastTime | search age > 86400 | sort age d | convert ctime(lastTime) | fields age,host,lastTime

and in 4.0:

| metadata type=hosts | eval age = now() - lastTime | search age > 86400 | sort age d | convert ctime(lastTime) | fields age,host,lastTime

Another 4.0 variant

| metadata type=hosts | sort recentTime desc | convert ctime(recentTime) as Recent_Time

Caveat: Many of these methods do not account for decommissioned hosts, which you are bound to have after a length of time. These hosts will also show up in the search results, as they also fit the criteria. Incorporating a host tag ('decommisioned', etc) into this search may help with this, but requires you to tag known hosts that are no longer valid.

View solution in original post

Explorer

try this
| metadata type=hosts

| eval lastHour=relative_time(now(),"-1h@h")
| eval yesterday=relative_time(now(), "-1d@d")
| where ( recentTime>yesterday AND recentTime

Good morning,

How can i check that the forwarders are sending the logs correctly? I have the following error in my logs:

"eventType=connect_fail" in metrics.log

metrics.log:12-17-2014 09:38:48.529 +0100 INFO StatusMgr - destHost=10.26.XX.XX, destIp=10.26.XX.XX, destPort=9997, eventType=connect_fail, publisher=tcpout, sourcePort=8089, statusee=TcpOutputProcessor

This event produce that the los are not been sending correctly, them i need know if any option in the program execution can check if splunk is sending or not the data to the server.

And, can i resolve this issue in the configuration with some parameter? This issue only appear in determinate times isn't fixed.

Thanks and regards.

0 Karma

Builder

Hi All!

I just made another test and changed a bit the logic.

I was looking for Forwarders not being sending data for more than let's say 2 minutes. Here's my latest version:

index=_internal "group=tcpin_connections" | stats max(_time) as latest by sourceHost | eval nowtime = now() | eval lag = (nowtime - latest)/60 | where lag > 2 | fields sourceHost latest lag

If you prefere, you can work on seconds of course instead of minutes 🙂

Marco

0 Karma

Builder

I'm going to test this later on 4.1.3 and let you know. I need to provide our Customer a dashboard to monitor all the remote forwarder at a glance.

Marco

0 Karma

Builder

Matt, I'v tried with Splunk 4.1.3 the following search:

"| metadata type=hosts | eval age = strftime("%s","now") - lastTime | search age > 86400 | sort age d | convert ctime(lastTime) | fields age,host,lastTime"

and I got the following error:

"Error in 'eval' command: Typechecking failed. '-' only takes numbers."

Marco

0 Karma

Splunk Employee
Splunk Employee

If you have deployed a number of Splunk forwarders and they are all pushing data to Splunk, you might not notice if one of them goes out of service, because the other forwarders are still pushing data to Splunk. You can run the following search to detect forwarders that have been up in the last 24 hours but not in the last 2 minutes. It uses the forwarder heartbeat, which is a feature of Splunk versions 3.2 and later.

index=_internal sourcetype="fwd-hb" starthoursago=24 | dedup host | eval age = strftime("%s","now") - _time | search age > 120 age < 86000

You can set this search up as an alert every several minutes so that Splunk will let you know if any of your active forwarders have not responded in the last 2 minutes.

If you're running a version of Splunk that is later than 3.3, the heartbeat message is not longer sent. Use the following search instead:

index=_internal "group=tcpin_connections" | stats max(_time) as latest by sourceHost | eventstats max(latest) as latest_all | eval lag = latest_all - latest | where lag > 120 | fields sourceHost lag

The following search works in 3.4.5 and finds all hosts who haven't sent a message in the last 24 hours

| metadata type=hosts | eval age = strftime("%s","now") - lastTime | search age > 86400 | sort age d | convert ctime(lastTime) | fields age,host,lastTime

and in 4.0:

| metadata type=hosts | eval age = now() - lastTime | search age > 86400 | sort age d | convert ctime(lastTime) | fields age,host,lastTime

Another 4.0 variant

| metadata type=hosts | sort recentTime desc | convert ctime(recentTime) as Recent_Time

Caveat: Many of these methods do not account for decommissioned hosts, which you are bound to have after a length of time. These hosts will also show up in the search results, as they also fit the criteria. Incorporating a host tag ('decommisioned', etc) into this search may help with this, but requires you to tag known hosts that are no longer valid.

View solution in original post

State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!