Deployment Architecture

Explanation of various HTTP(s) timeouts

hrawat_splunk
Splunk Employee
Splunk Employee

Explanation of various http(s) related timeouts and impact.

0 Karma
1 Solution

hrawat_splunk
Splunk Employee
Splunk Employee

Explain timeout configs, when and how to use these configs. 

                                                                                          
InstanceConf file StanzaConfigDefaultMax RecommendedPurposeWhen to use
Search Head

distsearch.conf

[distributedSearch]statusTimeout10 sec120 secConnect/read/write timeout to get search peer's info ( from SH main splunkd to peer main splunkd).  One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
 

 

      
   sendTimeout30 sec120 secsend/write timeout for SH search process to  peer's main splunkd. One thread handles many peers.Check search.log for send/write failure. This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   receiveTimeout600 sec1200 secread/receive timeout for SH search process to  peer's main splunkd. One thread handles many peers.Check search.log for read failure. 
   connectionTimeout10 sec120 secconnect timeout for SH search process to  peer's main splunkd. One thread handles many peers.Check search.log for connect failure. This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   authTokenConnectionTimeout5 sec120 secconnect timeout to get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.  This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   authTokenSendTimeout10 sec120 secsend/write timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.  This ca happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   authTokenReceiveTimeout10 sec120 secread/receive timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.
  [replicationSettings]connectionTimeout60 sec120 sec

connect timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer.

 Impacts actual bundle replication. Normally this never fails.
   sendRcvTimeout 60 sec120 sec read/write timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer. Impacts actual bundle replication. Normally this never fails.
 limits.conf[search]remote_timeline_connection_timeout5 sec30 secconnect timeout for SH search process to  peer's main splunkd for timeliner. Multi-threaded.Normally this never fails.
   remote_timeline_send_timeout10 sec30 secsend timeout for SH search process to  peer's main splunkd for timeliner. Multi-threaded.Normally this never fails.
 server.conf[shclustering]cxn_timeout_raft2 sec30 secSH to SH raft communication connect timeout. One thread handles one member. 
   send_timeout_raft5 sec 30 secSH to SH raft communication send timeout. One thread handles one member. 
   rcv_timeout_raft5 sec30 secSH to SH raft communication read timeout. One thread handles one member. 
Indexerdistsearch.conf[replicationSettings]sendRcvTimeout60  sec120 secRead/Write timeout by search peer for bundle replication ( from peer main splunkd to SH main splunkd)Impacts actual bundle replication.

Indexer/

Search head

server.conf[httpServer] busyKeepAliveIdleTimeout 12 sec 120 secSH Peer/Member HTTP server keep-alive connection idle timeout.When number of HTTP connections > 60, it disconnects any idle connection. Impacts bundle replication and search.
   streamInWriteTimeout5 sec30 secRead/Write timeout when receiving http request by http serverNormally this never fails.
   
dedicatedIoThreads
auto5Http Listener thread will let additional I/O threads do read/write from socket. 

Very useful for knowledge replication/config replication/raft/search artifact replication  and searches.

You will see in metrics.log that  mgmt_httpd  is consistently > 0.8

index=_internal source=*metrics.log* mgmt_httpd | timechart span=30s max(mgmt_httpd)

source=*metrics.log* thread=*dedicated* | timechart span=30s max(ratio) by thread

This can be added to web.conf if you see consistently  webui ratio > 0.8. UI response is sluggish.

Metrics - group=dutycycle, name=management, thread=webui, ratio=

   
dedicatedIoThreadsSelectionPolicy
round_robin
weighted_random
 

 

  [ sslConfig ]
useClientSSLCompression
truefalseSSL compression used by Http client for SH→SH / SH→IDX / SH→CM/ IDX->CMChange this settings  if increase in network bandwidth usage is not an issue.
DC  deploymentclient.conf[deployment-client]connect_timeout60120DC->DSIf DCs are timing out while connecting("Connect timeout").
DCdeploymentclient.conf
[deployment-client]send_timeout60120DC->DSIf there is a send timeout while sending request to DS
DCdeploymentclient.conf
[deployment-client]recv_timeout60120DC->DSIf DC read timed out while waiting for a response from DS.

 

 

 

 

View solution in original post

hrawat_splunk
Splunk Employee
Splunk Employee

Explain timeout configs, when and how to use these configs. 

                                                                                          
InstanceConf file StanzaConfigDefaultMax RecommendedPurposeWhen to use
Search Head

distsearch.conf

[distributedSearch]statusTimeout10 sec120 secConnect/read/write timeout to get search peer's info ( from SH main splunkd to peer main splunkd).  One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
 

 

      
   sendTimeout30 sec120 secsend/write timeout for SH search process to  peer's main splunkd. One thread handles many peers.Check search.log for send/write failure. This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   receiveTimeout600 sec1200 secread/receive timeout for SH search process to  peer's main splunkd. One thread handles many peers.Check search.log for read failure. 
   connectionTimeout10 sec120 secconnect timeout for SH search process to  peer's main splunkd. One thread handles many peers.Check search.log for connect failure. This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   authTokenConnectionTimeout5 sec120 secconnect timeout to get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.  This can happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   authTokenSendTimeout10 sec120 secsend/write timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.  This ca happen  to due to busy mgmt port on peer(check peer  metrics.log for dutycycle)
   authTokenReceiveTimeout10 sec120 secread/receive timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.
  [replicationSettings]connectionTimeout60 sec120 sec

connect timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer.

 Impacts actual bundle replication. Normally this never fails.
   sendRcvTimeout 60 sec120 sec read/write timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer. Impacts actual bundle replication. Normally this never fails.
 limits.conf[search]remote_timeline_connection_timeout5 sec30 secconnect timeout for SH search process to  peer's main splunkd for timeliner. Multi-threaded.Normally this never fails.
   remote_timeline_send_timeout10 sec30 secsend timeout for SH search process to  peer's main splunkd for timeliner. Multi-threaded.Normally this never fails.
 server.conf[shclustering]cxn_timeout_raft2 sec30 secSH to SH raft communication connect timeout. One thread handles one member. 
   send_timeout_raft5 sec 30 secSH to SH raft communication send timeout. One thread handles one member. 
   rcv_timeout_raft5 sec30 secSH to SH raft communication read timeout. One thread handles one member. 
Indexerdistsearch.conf[replicationSettings]sendRcvTimeout60  sec120 secRead/Write timeout by search peer for bundle replication ( from peer main splunkd to SH main splunkd)Impacts actual bundle replication.

Indexer/

Search head

server.conf[httpServer] busyKeepAliveIdleTimeout 12 sec 120 secSH Peer/Member HTTP server keep-alive connection idle timeout.When number of HTTP connections > 60, it disconnects any idle connection. Impacts bundle replication and search.
   streamInWriteTimeout5 sec30 secRead/Write timeout when receiving http request by http serverNormally this never fails.
   
dedicatedIoThreads
auto5Http Listener thread will let additional I/O threads do read/write from socket. 

Very useful for knowledge replication/config replication/raft/search artifact replication  and searches.

You will see in metrics.log that  mgmt_httpd  is consistently > 0.8

index=_internal source=*metrics.log* mgmt_httpd | timechart span=30s max(mgmt_httpd)

source=*metrics.log* thread=*dedicated* | timechart span=30s max(ratio) by thread

This can be added to web.conf if you see consistently  webui ratio > 0.8. UI response is sluggish.

Metrics - group=dutycycle, name=management, thread=webui, ratio=

   
dedicatedIoThreadsSelectionPolicy
round_robin
weighted_random
 

 

  [ sslConfig ]
useClientSSLCompression
truefalseSSL compression used by Http client for SH→SH / SH→IDX / SH→CM/ IDX->CMChange this settings  if increase in network bandwidth usage is not an issue.
DC  deploymentclient.conf[deployment-client]connect_timeout60120DC->DSIf DCs are timing out while connecting("Connect timeout").
DCdeploymentclient.conf
[deployment-client]send_timeout60120DC->DSIf there is a send timeout while sending request to DS
DCdeploymentclient.conf
[deployment-client]recv_timeout60120DC->DSIf DC read timed out while waiting for a response from DS.

 

 

 

 

Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...