Deployment Architecture

Why is my UF not connecting to HF and DS?

BalaMandalapu24
Loves-to-Learn Everything

My UF configured with deployment server 8089 and with HF 9997 both are not connecting

troubleshoot steps performed:

 

1. disabled iptables firewall

2. all servers in same subnet there is no network firewall issue i believe

3. configured outputs.conf under opt/Splunkforwarder/etc/system/local

 

 

 

[root@gcpas-d-sial02 ~]# tail -f /opt/splunkforwarder/var/log/splunk/splunkd.log
09-08-2022 05:41:22.535 +0000 WARN TcpOutputProc [13177 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=10.236.65.143 inside output group HF from host_src=gcpas-d-sial02 has been blocked for blocked_seconds=34900. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
09-08-2022 05:41:28.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
09-08-2022 05:41:28.180 +0000 INFO DC:PhonehomeThread [13138 PhonehomeThread] - Attempted handshake 2910 times. Will try to re-subscribe to handshake reply
09-08-2022 05:41:32.336 +0000 WARN AutoLoadBalancedConnectionStrategy [13178 TcpOutEloop] - Raw connection to ip=10.236.65.143:9997 timed out
09-08-2022 05:41:40.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
09-08-2022 05:41:52.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
09-08-2022 05:41:52.247 +0000 WARN AutoLoadBalancedConnectionStrategy [13178 TcpOutEloop] - Raw connection to ip=10.236.65.143:9997 timed out
09-08-2022 05:41:54.623 +0000 WARN HttpPubSubConnection [13137 HttpClientPollingThread_A4B05094-DB53-4495-B31D-853E566CE7E0] - Unable to parse message from PubSubSvr:
09-08-2022 05:41:54.623 +0000 INFO HttpPubSubConnection [13137 HttpClientPollingThread_A4B05094-DB53-4495-B31D-853E566CE7E0] - Could not obtain connection, will retry after=43.540 seconds.
09-08-2022 05:42:04.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
^X09-08-2022 05:42:12.104 +0000 WARN AutoLoadBalancedConnectionStrategy [13178 TcpOutEloop] - Raw connection to ip=10.236.65.143:9997 timed out

0 Karma

sylim_splunk
Splunk Employee
Splunk Employee

The issues with HF and DC appear to be different ones.

 - The HF connectivity, it's connected but UF is not able to send data any more as the 10.236.65.143  is not accepting connection. The HF must be full in data processing queues all the way to the tcpoutput to another indexer(s).

  How to work around/make it better - if you have many HFs then try to configure asynchronous forwarding. Here's the info for your reference:

https://www.linkedin.com/pulse/splunk-asynchronous-forwarding-lightning-fast-data-ingestor-rawat/?tr...

- The DC connection issue, firstly check the configuration or add https:// to targetUri if it doesn't have it yet.

  Try to capture tcpdump and see why it fails - if it's by client side or by the DS.  Or DNS failure..

 

lskaariwala
Loves-to-Learn Lots

Also, perform telnet and verify connectivity: 

curl -v telnet://ip:8089

curl -v telnet://ip:9997
0 Karma

lskaariwala
Loves-to-Learn Lots
0 Karma

chaker
Contributor

Assuming you can  ping the DS / HF from the UF.

- Have you run a netstat to check those ports are open, and that they are being used for Splunk and not something else?

- Have you checked the splunkd.log file on the DS and HF for any errors relating to these services/ports?

- Tried hitting the rest API on either DS or HF  using curl examples from docs?

https://docs.splunk.com/Documentation/Splunk/latest/RESTUM/RESTusing

- Is there enough disk space on the HF? (Less than 5GB by default will pausing indexing)

- Take a look at the  issue described here. It's going to be network related at some level. 

https://community.splunk.com/t5/Getting-Data-In/How-to-resolve-quot-err-not-connected-quot-error-in-...

Hope this helps.

 

0 Karma

BalaMandalapu24
Loves-to-Learn Everything

1. Ping working from both sides

2. all the splunk instances are in same subnet there is no restrictions 

3. local iptable firewall disabled/stopped

4. using free trail versions of UF, DS, HF, Indexer server

5. Telnet/ssh not happening eitehr side of these servers.

6. added static route, still the same problem

7. disk space available since these are newly build servers 

0 Karma

lskaariwala
Loves-to-Learn Lots

What Splunk version are your running ? Are all of them on latest version  9 ?

0 Karma

BalaMandalapu24
Loves-to-Learn Everything

splunk-9.0.1-82c987350fde-Linux-x86_64.tgz

splunkforwarder-9.0.0.1-9e907cedecb1-linux-2.6-ppc64le.rpm 

Yes all are in same version

 

 

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...