My UF configured with deployment server 8089 and with HF 9997 both are not connecting
troubleshoot steps performed:
1. disabled iptables firewall
2. all servers in same subnet there is no network firewall issue i believe
3. configured outputs.conf under opt/Splunkforwarder/etc/system/local
[root@gcpas-d-sial02 ~]# tail -f /opt/splunkforwarder/var/log/splunk/splunkd.log
09-08-2022 05:41:22.535 +0000 WARN TcpOutputProc [13177 parsing] - The TCP output processor has paused the data flow. Forwarding to host_dest=10.236.65.143 inside output group HF from host_src=gcpas-d-sial02 has been blocked for blocked_seconds=34900. This can stall the data flow towards indexing and other network outputs. Review the receiving system's health in the Splunk Monitoring Console. It is probably not accepting data.
09-08-2022 05:41:28.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
09-08-2022 05:41:28.180 +0000 INFO DC:PhonehomeThread [13138 PhonehomeThread] - Attempted handshake 2910 times. Will try to re-subscribe to handshake reply
09-08-2022 05:41:32.336 +0000 WARN AutoLoadBalancedConnectionStrategy [13178 TcpOutEloop] - Raw connection to ip=10.236.65.143:9997 timed out
09-08-2022 05:41:40.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
09-08-2022 05:41:52.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
09-08-2022 05:41:52.247 +0000 WARN AutoLoadBalancedConnectionStrategy [13178 TcpOutEloop] - Raw connection to ip=10.236.65.143:9997 timed out
09-08-2022 05:41:54.623 +0000 WARN HttpPubSubConnection [13137 HttpClientPollingThread_A4B05094-DB53-4495-B31D-853E566CE7E0] - Unable to parse message from PubSubSvr:
09-08-2022 05:41:54.623 +0000 INFO HttpPubSubConnection [13137 HttpClientPollingThread_A4B05094-DB53-4495-B31D-853E566CE7E0] - Could not obtain connection, will retry after=43.540 seconds.
09-08-2022 05:42:04.180 +0000 INFO DC:DeploymentClient [13138 PhonehomeThread] - channel=tenantService/handshake Will retry sending handshake message to DS; err=not_connected
^X09-08-2022 05:42:12.104 +0000 WARN AutoLoadBalancedConnectionStrategy [13178 TcpOutEloop] - Raw connection to ip=10.236.65.143:9997 timed out
The issues with HF and DC appear to be different ones.
- The HF connectivity, it's connected but UF is not able to send data any more as the 10.236.65.143 is not accepting connection. The HF must be full in data processing queues all the way to the tcpoutput to another indexer(s).
How to work around/make it better - if you have many HFs then try to configure asynchronous forwarding. Here's the info for your reference:
- The DC connection issue, firstly check the configuration or add https:// to targetUri if it doesn't have it yet.
Try to capture tcpdump and see why it fails - if it's by client side or by the DS. Or DNS failure..
Also, perform telnet and verify connectivity:
curl -v telnet://ip:8089
curl -v telnet://ip:9997
You might want to look at the certificates:
Assuming you can ping the DS / HF from the UF.
- Have you run a netstat to check those ports are open, and that they are being used for Splunk and not something else?
- Have you checked the splunkd.log file on the DS and HF for any errors relating to these services/ports?
- Tried hitting the rest API on either DS or HF using curl examples from docs?
https://docs.splunk.com/Documentation/Splunk/latest/RESTUM/RESTusing
- Is there enough disk space on the HF? (Less than 5GB by default will pausing indexing)
- Take a look at the issue described here. It's going to be network related at some level.
Hope this helps.
1. Ping working from both sides
2. all the splunk instances are in same subnet there is no restrictions
3. local iptable firewall disabled/stopped
4. using free trail versions of UF, DS, HF, Indexer server
5. Telnet/ssh not happening eitehr side of these servers.
6. added static route, still the same problem
7. disk space available since these are newly build servers
What Splunk version are your running ? Are all of them on latest version 9 ?
splunk-9.0.1-82c987350fde-Linux-x86_64.tgz
splunkforwarder-9.0.0.1-9e907cedecb1-linux-2.6-ppc64le.rpm
Yes all are in same version