Index Cluster configuration suggestions

TJSplunker · ‎12-17-2020

Hi everyone,

We are having trouble with index cluster stability and I was given these configuration changes to make to our Index Cluster.

However, I am troubled because these are A LOT of changes. The person who suggested this was offering these configurations because they said as your Splunk deployment grows it must be tuned (and that is logical enough) but I am still troubled by the sheer number of suggested configuration changes. We have three indexers in our cluster.

I wanted to throw these configurations out there as fodder and see what youguys come back with.

Thanks!

In server.conf on each indexer:

[clustering]

cxn_timeout = 300

send_timeout = 300

rcv_timeout = 300

heartbeat_period = 10

[httpServer]

busyKeepAliveIdleTimeout = 180

streamInWriteTimeout = 30

[sslConfig]

useClientSSLCompression = False

In server.conf on the Cluster Master:

[clustering]

executor_workers = 16

heartbeat_timeout = 300

cxn_timeout = 300

send_timeout = 300

rcv_timeout = 300

max_peer_build_load = 5

max_fixup_time_ms = 5000

max_peers_to_download_bundle = 5

[httpServer]

busyKeepAliveIdleTimeout = 180

streamInWriteTimeout = 30

[sslConfig]

useClientSSLCompression = false

In distsearch.conf on indexers and cluster master:

[replicationSettings]

sendRcvTimeout = 120

In distsearch.conf on all search heads:

statusTimeout = 120

connectionTimeout =120

authTokenConnectionTimeout = 120

authTokenSendTimeout = 120

authTokenReceiveTimeout = 120

#receiveTimeout = 120

[replicationSettings]

connectionTimeout =120

sendRcvTimeout = 120

in server.conf on the search heads:

[sslConfig]

useClientSSLCompression = false

See what I mean? That's a lot of changes! So many that it makes surprised and a little uncomfortable. If anybody has any specific experiences with these settings please let me know.

Thanks!

-TJ

esix_splunk · ‎12-22-2020

Most of these are from the Splunk Cloud configurations and those that we've seen to provide best performance. These are also configurations that we've seen to be the best for our customers that are scaling 20tb+ a day and well beyond.

They can definitely help with rolling restarts in large environments. If you're unsure on whether to apply these or not, you can open support ticket and ask them to validate the configuration changes. (Assuming they're not the ones who gave you the configs..)

richgalloway · ‎12-18-2020

I don't know that any of those settings will fix the problem, but they don't seem unreasonable. I'm not sure I'd go from 1 minute to 5 minutes for the timeouts, however. Maybe 3 for starters.

Also, the receiveTimeout setting is commented out so no change will be effective.

---
If this reply helps you, Karma would be appreciated.

richgalloway · ‎12-17-2020

What trouble are you having that these changes are supposed to fix? Are these all new settings or changes to existing ones? If the latter then please share the current values.

---
If this reply helps you, Karma would be appreciated.

TJSplunker · ‎12-18-2020

Hi Rich,

These changes are supposed to fix timeouts for inter-cluster communication in larger index clusters. However, my main issue was that my index cluster always has issues upon rolling restart (and the issues always seem to be something different each time). Maybe this will alleviate some of those issues.

Here is the data you requested. For each setting, I have displayed the data in the format:

'cxn_timeout = current value/suggested value'

In server.conf on each indexer:

[clustering]

cxn_timeout = 60/300

send_timeout = 60/300

rcv_timeout = 60/300

heartbeat_period = 1/10

[httpServer]

busyKeepAliveIdleTimeout = 12/180

streamInWriteTimeout = 5/30

[sslConfig]

useClientSSLCompression = True/False

In server.conf on the Cluster Master:

[clustering]

executor_workers = 10/16

heartbeat_timeout = 60/300

cxn_timeout = 60/300

send_timeout = 60/300

rcv_timeout = 60/300

max_peer_build_load = 2/5

max_fixup_time_ms = 5000/5000

max_peers_to_download_bundle = 5/5

[httpServer]

busyKeepAliveIdleTimeout = 12/180

streamInWriteTimeout = 5/30

[sslConfig]

useClientSSLCompression = true/false

In distsearch.conf on indexers and cluster master:

[replicationSettings]

sendRcvTimeout = 60/120

In distsearch.conf on all search heads:

statusTimeout = 10/120

connectionTimeout =10/120

authTokenConnectionTimeout = 5/120

authTokenSendTimeout = 10/120

authTokenReceiveTimeout = 10/120

#receiveTimeout = 600/120

[replicationSettings]

connectionTimeout =60/120

sendRcvTimeout = 60/120

in server.conf on the search heads:

[sslConfig]

useClientSSLCompression = true/false

Index Cluster configuration suggestions

capacity planning

captain

indexer clustering

search peer

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!