Deployment Architecture

Search Heads are unable to distribute to Indexers

Splunk Employee
Splunk Employee

Find that it has the frequent error message that the search head cannot connect to the Indexer.

"Unable to distribute to peer named xx.xx.xx.xx:8089 at uri=xx.xx.xx.xx:8089 using the uri-scheme=https because peer has status=Down"

It happens from time to time. Any search head will have that error message. Also, SH will have the connection issue to any of the indexers in clusters (not restricted to particular indexer). During the worst case, the SH will report the error to all the indexers and cause some service outage.

But after sometimes without doing anything, the service will come back to normal.

Also the CPU and memory are normal even the error message is happening.

Labels (3)
0 Karma

Splunk Employee
Splunk Employee

1) Try to do the following:

Increase distsearch.conf timeouts on the SH as:

statusTimeout = 120
connectionTimeout = 120
authTokenConnectionTimeout = 120
authTokenSendTimeout = 120
authTokenReceiveTimeout = 120

connectionTimeout = 120
sendRcvTimeout = 120

On the indexers at distsearch.conf
connectionTimeout = 120
sendRcvTimeout = 120 and

in server.conf

busyKeepAliveIdleTimeout = 120

It seems it has a little bit improvement after the change. But the error message is still shown from time to time.

2) Checking the pstack output, find the I/O thread is busy with SSL handshakes. SSL operation slow down the process causing timeout.

3) SSL operation (especially compression) are CPU intensive routes. It needs to be invoked within the main IO thread (management port) that IO operation will be slowed down.

4) It's a known issue limitation of OpenSSL design - compression is done during the write operation blocking IO

5) Disable the SSL client compression in the search head

useClientSSLCompression = false

6) The system is running back to normal after disabling the ssl client compression in the search head

0 Karma


Hello @tlam,

great analysis!

  • 5 - Does disabling the SSL compression increase the network transfer time? And demands increasing of timeouts even further?

  • 3 - Can you please check that used crypto is hardware accelerated?

    openssl engine -t -c

    cat /proc/crypto

Can you please post /proc/cpuinfo and /etc/*elease* ?

I cannot find a right reference right now, can you please check

0 Karma
Get Updates on the Splunk Community!

.conf24 | Day 0

Hello Splunk Community! My name is Chris, and I'm based in Canberra, Australia's capital, and I travelled for ...

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...