Hi everyone,
We are having trouble with index cluster stability and I was given these configuration changes to make to our Index Cluster.
However, I am troubled because these are A LOT of changes. The person who suggested this was offering these configurations because they said as your Splunk deployment grows it must be tuned (and that is logical enough) but I am still troubled by the sheer number of suggested configuration changes. We have three indexers in our cluster.
I wanted to throw these configurations out there as fodder and see what youguys come back with.
Thanks!
In server.conf on each indexer:
[clustering] |
cxn_timeout = 300 |
send_timeout = 300 |
rcv_timeout = 300 |
heartbeat_period = 10 |
[httpServer] |
busyKeepAliveIdleTimeout = 180 |
streamInWriteTimeout = 30 |
[sslConfig] |
useClientSSLCompression = False |
In server.conf on the Cluster Master:
[clustering] |
executor_workers = 16 |
heartbeat_timeout = 300 |
cxn_timeout = 300 |
send_timeout = 300 |
rcv_timeout = 300 |
max_peer_build_load = 5 |
max_fixup_time_ms = 5000 |
max_peers_to_download_bundle = 5 |
[httpServer] |
busyKeepAliveIdleTimeout = 180 |
streamInWriteTimeout = 30 |
[sslConfig] |
useClientSSLCompression = false |
In distsearch.conf on indexers and cluster master:
[replicationSettings] |
sendRcvTimeout = 120 |
In distsearch.conf on all search heads:
statusTimeout = 120 |
connectionTimeout =120 |
authTokenConnectionTimeout = 120 |
authTokenSendTimeout = 120 |
authTokenReceiveTimeout = 120 |
#receiveTimeout = 120 |
[replicationSettings] |
connectionTimeout =120 |
sendRcvTimeout = 120 |
in server.conf on the search heads:
[sslConfig] |
useClientSSLCompression = false |
See what I mean? That's a lot of changes! So many that it makes surprised and a little uncomfortable. If anybody has any specific experiences with these settings please let me know.
Thanks!
-TJ
Most of these are from the Splunk Cloud configurations and those that we've seen to provide best performance. These are also configurations that we've seen to be the best for our customers that are scaling 20tb+ a day and well beyond.
They can definitely help with rolling restarts in large environments. If you're unsure on whether to apply these or not, you can open support ticket and ask them to validate the configuration changes. (Assuming they're not the ones who gave you the configs..)
I don't know that any of those settings will fix the problem, but they don't seem unreasonable. I'm not sure I'd go from 1 minute to 5 minutes for the timeouts, however. Maybe 3 for starters.
Also, the receiveTimeout setting is commented out so no change will be effective.
What trouble are you having that these changes are supposed to fix? Are these all new settings or changes to existing ones? If the latter then please share the current values.
Hi Rich,
These changes are supposed to fix timeouts for inter-cluster communication in larger index clusters. However, my main issue was that my index cluster always has issues upon rolling restart (and the issues always seem to be something different each time). Maybe this will alleviate some of those issues.
Here is the data you requested. For each setting, I have displayed the data in the format:
'cxn_timeout = current value/suggested value'
In server.conf on each indexer:
[clustering] |
cxn_timeout = 60/300 |
send_timeout = 60/300 |
rcv_timeout = 60/300 |
heartbeat_period = 1/10 |
[httpServer] |
busyKeepAliveIdleTimeout = 12/180 |
streamInWriteTimeout = 5/30 |
[sslConfig] |
useClientSSLCompression = True/False |
In server.conf on the Cluster Master:
[clustering] |
executor_workers = 10/16 |
heartbeat_timeout = 60/300 |
cxn_timeout = 60/300 |
send_timeout = 60/300 |
rcv_timeout = 60/300 |
max_peer_build_load = 2/5 |
max_fixup_time_ms = 5000/5000 |
max_peers_to_download_bundle = 5/5 |
[httpServer] |
busyKeepAliveIdleTimeout = 12/180 |
streamInWriteTimeout = 5/30 |
[sslConfig] |
useClientSSLCompression = true/false |
In distsearch.conf on indexers and cluster master:
[replicationSettings] |
sendRcvTimeout = 60/120 |
In distsearch.conf on all search heads:
statusTimeout = 10/120 |
connectionTimeout =10/120 |
authTokenConnectionTimeout = 5/120 |
authTokenSendTimeout = 10/120 |
authTokenReceiveTimeout = 10/120 |
#receiveTimeout = 600/120 |
[replicationSettings] |
connectionTimeout =60/120 |
sendRcvTimeout = 60/120 |
in server.conf on the search heads:
[sslConfig] |
useClientSSLCompression = true/false |