Deployment Architecture

Index Cluster configuration suggestions

TJSplunker
Engager

Hi everyone,

We are having trouble with index cluster stability and I was given these configuration changes to make to our Index Cluster. 

However, I am troubled because these are A LOT of changes. The person who suggested this was offering these configurations because they said as your Splunk deployment grows it must be tuned (and that is logical enough) but I am still troubled by the sheer number of suggested configuration changes. We have three indexers in our cluster.

I wanted to throw these configurations out there as fodder and see what youguys come back with. 

Thanks!

In server.conf on each indexer:

[clustering]
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
heartbeat_period = 10
 
[httpServer]
busyKeepAliveIdleTimeout = 180
streamInWriteTimeout = 30
 
[sslConfig]
useClientSSLCompression = False

 

In server.conf on the Cluster Master:

[clustering]
executor_workers = 16
heartbeat_timeout = 300
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
max_peer_build_load = 5
max_fixup_time_ms = 5000
max_peers_to_download_bundle = 5
 
[httpServer]
busyKeepAliveIdleTimeout = 180
streamInWriteTimeout = 30
 
[sslConfig]
useClientSSLCompression = false

 

In distsearch.conf on indexers and cluster master:

[replicationSettings]
sendRcvTimeout = 120

 

In distsearch.conf on all search heads:

statusTimeout = 120
connectionTimeout =120
authTokenConnectionTimeout = 120
authTokenSendTimeout = 120
authTokenReceiveTimeout = 120
#receiveTimeout = 120
 
[replicationSettings]
connectionTimeout =120
sendRcvTimeout = 120

 

in server.conf on the search heads:

[sslConfig]
useClientSSLCompression = false

 

See what I mean? That's a lot of changes! So many that it makes surprised and a little uncomfortable. If anybody has any specific experiences with these settings please let me know.

Thanks!

-TJ

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Most of these are from the Splunk Cloud configurations and those that we've seen to provide best performance. These are also configurations that we've seen to be the best for our customers that are scaling 20tb+ a day and well beyond.

 

They can definitely help with rolling restarts in large environments. If you're unsure on whether to apply these or not, you can open support ticket and ask them to validate the configuration changes. (Assuming they're not the ones who gave you the configs..)

richgalloway
SplunkTrust
SplunkTrust

I don't know that any of those settings will fix the problem, but they don't seem unreasonable.  I'm not sure I'd go from 1 minute to 5 minutes for the timeouts, however.  Maybe 3 for starters.

Also, the receiveTimeout setting is commented out so no change will be effective.

---
If this reply helps you, an upvote would be appreciated.

richgalloway
SplunkTrust
SplunkTrust

What trouble are you having that these changes are supposed to fix?  Are these all new settings or changes to existing ones?  If the latter then please share the current values.

---
If this reply helps you, an upvote would be appreciated.
0 Karma

TJSplunker
Engager

Hi Rich,

These changes are supposed to fix timeouts for inter-cluster communication in larger index clusters. However, my main issue was that my index cluster always has issues upon rolling restart (and the issues always seem to be something different each time). Maybe this will alleviate some of those issues.

Here is the data you requested. For each setting, I have displayed the data in the format:

'cxn_timeout = current value/suggested value'

In server.conf on each indexer:

[clustering]
cxn_timeout = 60/300
send_timeout = 60/300
rcv_timeout = 60/300
heartbeat_period =  1/10
 
[httpServer]
busyKeepAliveIdleTimeout = 12/180
streamInWriteTimeout = 5/30
 
[sslConfig]
useClientSSLCompression = True/False

 

In server.conf on the Cluster Master:

[clustering]
executor_workers = 10/16
heartbeat_timeout = 60/300
cxn_timeout = 60/300
send_timeout = 60/300
rcv_timeout = 60/300
max_peer_build_load = 2/5
max_fixup_time_ms = 5000/5000
max_peers_to_download_bundle = 5/5
 
[httpServer]
busyKeepAliveIdleTimeout = 12/180
streamInWriteTimeout = 5/30
 
[sslConfig]
useClientSSLCompression = true/false

 

In distsearch.conf on indexers and cluster master:

[replicationSettings]
sendRcvTimeout = 60/120

 

In distsearch.conf on all search heads:

statusTimeout = 10/120
connectionTimeout =10/120
authTokenConnectionTimeout = 5/120
authTokenSendTimeout = 10/120
authTokenReceiveTimeout = 10/120
#receiveTimeout = 600/120
 
[replicationSettings]
connectionTimeout =60/120
sendRcvTimeout = 60/120

 

in server.conf on the search heads:

[sslConfig]
useClientSSLCompression = true/false
0 Karma
.conf21 CFS Extended through 5/20!

Don't miss your chance
to share your Splunk
wisdom in-person or
virtually at .conf21!

Call for Speakers has
been extended through
Thursday, 5/20!