Deployment Architecture

Index Cluster configuration suggestions

TJSplunker
Engager

Hi everyone,

We are having trouble with index cluster stability and I was given these configuration changes to make to our Index Cluster. 

However, I am troubled because these are A LOT of changes. The person who suggested this was offering these configurations because they said as your Splunk deployment grows it must be tuned (and that is logical enough) but I am still troubled by the sheer number of suggested configuration changes. We have three indexers in our cluster.

I wanted to throw these configurations out there as fodder and see what youguys come back with. 

Thanks!

In server.conf on each indexer:

[clustering]
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
heartbeat_period = 10
 
[httpServer]
busyKeepAliveIdleTimeout = 180
streamInWriteTimeout = 30
 
[sslConfig]
useClientSSLCompression = False

 

In server.conf on the Cluster Master:

[clustering]
executor_workers = 16
heartbeat_timeout = 300
cxn_timeout = 300
send_timeout = 300
rcv_timeout = 300
max_peer_build_load = 5
max_fixup_time_ms = 5000
max_peers_to_download_bundle = 5
 
[httpServer]
busyKeepAliveIdleTimeout = 180
streamInWriteTimeout = 30
 
[sslConfig]
useClientSSLCompression = false

 

In distsearch.conf on indexers and cluster master:

[replicationSettings]
sendRcvTimeout = 120

 

In distsearch.conf on all search heads:

statusTimeout = 120
connectionTimeout =120
authTokenConnectionTimeout = 120
authTokenSendTimeout = 120
authTokenReceiveTimeout = 120
#receiveTimeout = 120
 
[replicationSettings]
connectionTimeout =120
sendRcvTimeout = 120

 

in server.conf on the search heads:

[sslConfig]
useClientSSLCompression = false

 

See what I mean? That's a lot of changes! So many that it makes surprised and a little uncomfortable. If anybody has any specific experiences with these settings please let me know.

Thanks!

-TJ

0 Karma

esix_splunk
Splunk Employee
Splunk Employee

Most of these are from the Splunk Cloud configurations and those that we've seen to provide best performance. These are also configurations that we've seen to be the best for our customers that are scaling 20tb+ a day and well beyond.

 

They can definitely help with rolling restarts in large environments. If you're unsure on whether to apply these or not, you can open support ticket and ask them to validate the configuration changes. (Assuming they're not the ones who gave you the configs..)

richgalloway
SplunkTrust
SplunkTrust

I don't know that any of those settings will fix the problem, but they don't seem unreasonable.  I'm not sure I'd go from 1 minute to 5 minutes for the timeouts, however.  Maybe 3 for starters.

Also, the receiveTimeout setting is commented out so no change will be effective.

---
If this reply helps you, Karma would be appreciated.

richgalloway
SplunkTrust
SplunkTrust

What trouble are you having that these changes are supposed to fix?  Are these all new settings or changes to existing ones?  If the latter then please share the current values.

---
If this reply helps you, Karma would be appreciated.
0 Karma

TJSplunker
Engager

Hi Rich,

These changes are supposed to fix timeouts for inter-cluster communication in larger index clusters. However, my main issue was that my index cluster always has issues upon rolling restart (and the issues always seem to be something different each time). Maybe this will alleviate some of those issues.

Here is the data you requested. For each setting, I have displayed the data in the format:

'cxn_timeout = current value/suggested value'

In server.conf on each indexer:

[clustering]
cxn_timeout = 60/300
send_timeout = 60/300
rcv_timeout = 60/300
heartbeat_period =  1/10
 
[httpServer]
busyKeepAliveIdleTimeout = 12/180
streamInWriteTimeout = 5/30
 
[sslConfig]
useClientSSLCompression = True/False

 

In server.conf on the Cluster Master:

[clustering]
executor_workers = 10/16
heartbeat_timeout = 60/300
cxn_timeout = 60/300
send_timeout = 60/300
rcv_timeout = 60/300
max_peer_build_load = 2/5
max_fixup_time_ms = 5000/5000
max_peers_to_download_bundle = 5/5
 
[httpServer]
busyKeepAliveIdleTimeout = 12/180
streamInWriteTimeout = 5/30
 
[sslConfig]
useClientSSLCompression = true/false

 

In distsearch.conf on indexers and cluster master:

[replicationSettings]
sendRcvTimeout = 60/120

 

In distsearch.conf on all search heads:

statusTimeout = 10/120
connectionTimeout =10/120
authTokenConnectionTimeout = 5/120
authTokenSendTimeout = 10/120
authTokenReceiveTimeout = 10/120
#receiveTimeout = 600/120
 
[replicationSettings]
connectionTimeout =60/120
sendRcvTimeout = 60/120

 

in server.conf on the search heads:

[sslConfig]
useClientSSLCompression = true/false
0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...