On a number of CentOS 6 machines which have long iptables rules with multiple chains (details can be provided if required) the UF can be installed ok however when running this command:
/opt/splunkforwarder/bin/splunk set deploy-poll <splunkdeploymentserver.fqdn:8089>
The command times out and eventually throws this warning/error
Couldn't complete HTTP request: Connection timed out
On other CentOS 6 boxes still with iptables enabled but without the number of chains, the command works as expected. I've parsed the malfunctioning iptables rules and cannot see any conflict or reason for this to fail. Additionally adding specific rules for all ports both tcp and udp to the top of both the INPUT and OUTPUT chains makes no difference.
And even more bizarrely I can telnet to the splunk deployment server over port 8089 successfully.....
Runnin nmap from one of the affected clients shows ports open on the deployment server as follows
Starting Nmap 5.51 ( http://nmap.org ) at 2019-11-08 13:41 GMT Nmap scan report for splunkdeployment.fqdn (ip-address) Host is up (0.0017s latency). Not shown: 65533 closed ports PORT STATE SERVICE 443/tcp open https 8089/tcp open unknown 9997/tcp open unknown
As soon as I disable iptables however i can run the set deploy-poll command successfully.
Has anyone encountered this sort of behaviour before?
Sounds like your iptables somehow does allow the connection to be established (SYN/ACK packets), but does not allow data to be sent/returned? Perhaps check with netstat and tcpdump on the deploymentserver whether you do see connections getting established and see what happens after that.
Thanks FrankVI - given that if I disable iptables on the problem machine(s) then the set deploy-poll command works straight away it seems clear that iptables is at fault on the servers running UF not the deployment server. I don't have access to the console on the deployment server to check netstat etc but can get one of my colleagues to do so monday.
I had added logging too using this command
sudo iptables -I INPUT 1 -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7
But when checking the logs post another run of set deploy-poll command there is nothing ref dropped packets... nothing in logs at all come to that.
You could check netstat/tcpdump also on the affected deployment clients if that is easier. Without having the actual ruleset it is a bit difficult to give any solutions other than suggestions on how to troubleshoot in more detail.
I tried the tcpdump approach and could not see any traffic at all to the deployment server either with iptables enabled or disabled. I tried pinpointing just port 8089 and there were no packets captured at all. So I think there is something else at play here possibly.
All the deploy-poll commands failed not just the set command. I'd expect the set to fail if there was communication being blocked by the iptables rule but the show command also failed (and still does) if iptables is enabled.
As a workaround I manually created this file
With the following contents
[target-broker:deploymentServer] targetUri = splunkdeployment.mydomain.com:8089
And the deployment client is now showing up on the deployment server as managed. I'm going to try the above workaround on the remaining problem machines and see how far that gets me - not ideal but if it works it works (and may help someone else finding this potentially).
Interesting. So it seems it is the splunk cli commands that are failing, not the actual DC -> DS communication. Could it be that the cli command, under the hood, performs some local rest call to the splunk daemon or so, which gets blocked by iptables (just wildly thinking out loud here)? Might be something to check with Splunk Support.
Using config files instead of cli may not be a bad idea anyway for configuring the deployment client. By putting this config into a small app, you can even manage it from the DS later on, in case you want to tune the phone home interval, or even move clients to a different deployment server.