There are 503 errors and DC can't connect messages in splunkd.log on deployment client.
That is because with default DS configuration, DS has reached a saturation point of handling too many phone home checks. Here are the recommendations to increase DS scalability.
On the DS server.conf
[sslConfig]
sslServerSessionTimeout = 7200
[httpServer]
dedicatedIoThreads = 10
Set following config to 512.
In $SPLUNK_HOME/etc/splunk-launch.conf on DS.
SPLUNK_LISTEN_BACKLOG = 512
Note: Make sure linux net.core.somaxconn setting is more than SPLUNK_LISTEN_BACKLOG.
On DC side server.conf
[sslConfig]
useSslClientSessionCache=true
Upgrade all DC to 7.1.3 and above to have configurable and higher default DC timeouts. Before 7.1.3 these are hardcoded 5 sec.
connect_timeout = <positive integer>
* Default: 60
send_timeout = <positive integer>
* Default: 60
recv_timeout = <positive integer>
* Default: 60
That is because with default DS configuration, DS has reached a saturation point of handling too many phone home checks. Here are the recommendations to increase DS scalability.
On the DS server.conf
[sslConfig]
sslServerSessionTimeout = 7200
[httpServer]
dedicatedIoThreads = 10
Set following config to 512.
In $SPLUNK_HOME/etc/splunk-launch.conf on DS.
SPLUNK_LISTEN_BACKLOG = 512
Note: Make sure linux net.core.somaxconn setting is more than SPLUNK_LISTEN_BACKLOG.
On DC side server.conf
[sslConfig]
useSslClientSessionCache=true
Upgrade all DC to 7.1.3 and above to have configurable and higher default DC timeouts. Before 7.1.3 these are hardcoded 5 sec.
connect_timeout = <positive integer>
* Default: 60
send_timeout = <positive integer>
* Default: 60
recv_timeout = <positive integer>
* Default: 60