Getting Data In

Splunk Forwarder Connection Refused from Splunk Indexer

BP9906
Builder

Ever since we added a few more Splunk Forwarders to our environment, the Splunk Server (search head, indexer, deployment server, Windows box) stopped accepting connections from the Forwarders.

We have around 30 forwarders total, all going to the Splunk server.

Splunk server is now 4.3.2 and no change. Restarting the Splunk server helps for about 2 minutes, then the agents reconnect and then end up in a failed state after a couple minutes.

Forwarder splunkd.log shows:

06-06-2012 11:27:11.884 -0700 INFO TcpOutputProc - Connected to idx=splunkserver:9997
06-06-2012 11:27:11.885 -0700 INFO TcpOutputProc - Connected to idx=splunkserver:9997
06-06-2012 11:28:03.981 -0700 INFO BatchReader - Removed from queue file='/opt/splunkforwarder/var/log/splunk/metrics.log.2'.
06-06-2012 11:29:41.070 -0700 INFO BatchReader - Removed from queue file='/opt/splunkforwarder/var/log/splunk/metrics.log.5'.
06-06-2012 11:29:55.226 -0700 WARN TcpOutputFd - Connect to splunkserver:9997 failed. Connection refused
06-06-2012 11:29:55.226 -0700 ERROR TcpOutputFd - Connection to host=splunkserver:9997 failed
06-06-2012 11:29:55.226 -0700 WARN TcpOutputFd - Connect to splunkserver:9997 failed. Connection refused
06-06-2012 11:29:55.226 -0700 ERROR TcpOutputFd - Connection to host=splunkserver:9997 failed
06-06-2012 11:29:55.226 -0700 INFO TcpOutputProc - Detected connection to splunkserver:9997 closed
06-06-2012 11:29:55.226 -0700 INFO TcpOutputProc - Detected connection to splunkserver:9997 closed
06-06-2012 11:29:56.553 -0700 WARN TcpOutputFd - Connect to splunkserver:9997 failed. Connection refused
06-06-2012 11:29:56.553 -0700 ERROR TcpOutputFd - Connection to host=splunkserver:9997 failed
06-06-2012 11:29:56.553 -0700 WARN TcpOutputFd - Connect to splunkserver:9997 failed. Connection refused
06-06-2012 11:29:56.553 -0700 ERROR TcpOutputFd - Connection to host=splunkserver:9997 failed
06-06-2012 11:29:56.553 -0700 WARN TcpOutputProc - Applying quarantine to idx=splunkserver:9997 numberOfFailures=2
06-06-2012 11:29:56.553 -0700 WARN TcpOutputProc - Applying quarantine to idx=splunkserver:9997 numberOfFailures=2
06-06-2012 11:30:25.221 -0700 INFO TcpOutputProc - Removing quarantine from idx=splunkserver:9997

Splunk Server splunkd.log doesnt show much related to the inbound connections. Perhaps a debug flag needs to be set?

Any ideas?

1 Solution

BP9906
Builder

Solution found!

Etc/system/local/inputs.conf

[splunktcp://9997]
connection_host = none

restart splunk server and its fixed. DNS was holding it all up.

View solution in original post

BP9906
Builder

Solution found!

Etc/system/local/inputs.conf

[splunktcp://9997]
connection_host = none

restart splunk server and its fixed. DNS was holding it all up.

Fernandisstepha
New Member

Hello Team,

I did same, you all suggested, but it doesn't work me

Etc/system/local/inputs.conf

[splunktcp://9997]
connection_host = none

Any other work around?

 

Regard

Steven

0 Karma

muez
Explorer

Where to keep these settings?
My 2 Heavy forwarders, cluster master or all of my 10 indexers?
@BP9906 @lrudolph @msclimenti

0 Karma

BP9906
Builder

From the documentation it says it can be put at various levels in the inputs.conf.
I find it easier to set connection_host = ip since it does not perform reverse dns lookup and you get the IP if the hostname is not provided via the splunkforwarder (ie if its syslog or something).

To answer your question, you would want to review the connection_host setting on any receiving end which would be your heavy forwarders and indexers.

0 Karma

woodcock
Esteemed Legend

On the indexers.

0 Karma

dstaulcu
Builder

Did you ever find out why DNS resolution became a problem?

0 Karma

msclimenti
Engager

Not sure how you figured this out but thanks a ton!!!

0 Karma

BP9906
Builder

I thought I'd also add that telnet splunkserver 9997 shows connection refused.
When I'm on the splunkserver box directly and do telnet localhost 9997 I get the same. Netstat -ano revals its listening on 9997 and has splunkd.exe as the PID owning the port.

0 Karma

laurie_gellatly
Communicator

Yep, that's a "Me too". This little gem was causing all types of slowness on the delivery of events and the unpredicatble connection of UFs. Adding SSL to the UF-HF connection seems to make it even worse. UF's complained
Connect to x.x.x.x:9997 failed. No connection could be made because the target machine actively refused it
Connection to host=x.x.x.x:9997 failed
Cooked connection to ip=x.x.x.x:9997 timed out

Thanks ...Laurie:{)

0 Karma

lrudolph
Path Finder

Yeah! This was finally the solution to my problem, too. Our forwarders showed a lot of "WARN TcpOutputProc - Cooked connection to ip=x.x.x.x:9997 timed out"-messages in the logs. Finally, we lost data, even with two indexers and useACK=true in place. We could trace it back to the not configured connection_host-setting of the indexers which defaulted to "dns". Since we don't use a DNS-Server in out network, the number of forwarders we deployed finally slowed everything down and finally lead to data which couldn't be indexed. connection_host = none solved it all.

Thank you!

0 Karma

AaronMoorcroft
Communicator

The Connection_Host setting, where is that and in this case was it on the indexer/s or the forwarder that you changed it ?

0 Karma

woodcock
Esteemed Legend

This is a setting on the indexers.

0 Karma

BP9906
Builder

Yep, and that window is upon restart of the Splunk server (ie splunk.exe restart command). After that short window, all the forwarders stop receiving.

0 Karma

sowings
Splunk Employee
Splunk Employee

So is there some window in which telnet splunkserver 9997 does work?

0 Karma

BP9906
Builder

Windows Firewall is allowed, especially since the agents connect after I restart the Splunk Indexer (splunk.exe restart). After 2-4 minutes of the splunk indexer restart, they disconnect, connections are refused, then after about 5 minutes, the splunk server starts accepting the tcp connection again, but no data is being received by the indexer.

0 Karma

kreszan
Explorer

I have the same issue. What was your resolution ? I'm on 6.1.5 now.

0 Karma

sowings
Splunk Employee
Splunk Employee

Firewalls in play?

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...