Solved: Re: Explanation of various HTTP(s) timeouts

hrawat_splunk · ‎12-22-2022

Explanation of various http(s) related timeouts and impact.

hrawat_splunk · ‎12-22-2022

Explain timeout configs, when and how to use these configs.

Instance	Conf file	Stanza	Config	Default	Max Recommended	Purpose	When to use
Search Head	distsearch.conf	[distributedSearch]	statusTimeout	10 sec	120 sec	Connect/read/write timeout to get search peer's info ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)

			sendTimeout	30 sec	120 sec	send/write timeout for SH search process to peer's main splunkd. One thread handles many peers.	Check search.log for send/write failure. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			receiveTimeout	600 sec	1200 sec	read/receive timeout for SH search process to peer's main splunkd. One thread handles many peers.	Check search.log for read failure.
			connectionTimeout	10 sec	120 sec	connect timeout for SH search process to peer's main splunkd. One thread handles many peers.	Check search.log for connect failure. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			authTokenConnectionTimeout	5 sec	120 sec	connect timeout to get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			authTokenSendTimeout	10 sec	120 sec	send/write timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This ca happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			authTokenReceiveTimeout	10 sec	120 sec	read/receive timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.
		[replicationSettings]	connectionTimeout	60 sec	120 sec	connect timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer.	Impacts actual bundle replication. Normally this never fails.
			sendRcvTimeout	60 sec	120 sec	read/write timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer.	Impacts actual bundle replication. Normally this never fails.
	limits.conf	[search]	remote_timeline_connection_timeout	5 sec	30 sec	connect timeout for SH search process to peer's main splunkd for timeliner. Multi-threaded.	Normally this never fails.
			remote_timeline_send_timeout	10 sec	30 sec	send timeout for SH search process to peer's main splunkd for timeliner. Multi-threaded.	Normally this never fails.
	server.conf	[shclustering]	cxn_timeout_raft	2 sec	30 sec	SH to SH raft communication connect timeout. One thread handles one member.
			send_timeout_raft	5 sec	30 sec	SH to SH raft communication send timeout. One thread handles one member.
			rcv_timeout_raft	5 sec	30 sec	SH to SH raft communication read timeout. One thread handles one member.
Indexer	distsearch.conf	[replicationSettings]	sendRcvTimeout	60 sec	120 sec	Read/Write timeout by search peer for bundle replication ( from peer main splunkd to SH main splunkd)	Impacts actual bundle replication.
Indexer/ Search head	server.conf	[httpServer]	busyKeepAliveIdleTimeout	12 sec	120 sec	SH Peer/Member HTTP server keep-alive connection idle timeout.	When number of HTTP connections > 60, it disconnects any idle connection. Impacts bundle replication and search.
			streamInWriteTimeout	5 sec	30 sec	Read/Write timeout when receiving http request by http server	Normally this never fails.
			dedicatedIoThreads	auto	5	Http Listener thread will let additional I/O threads do read/write from socket.	Very useful for knowledge replication/config replication/raft/search artifact replication and searches. You will see in metrics.log that mgmt_httpd is consistently > 0.8 index=_internal source=metrics.log mgmt_httpd \| timechart span=30s max(mgmt_httpd) source=metrics.log thread=dedicated \| timechart span=30s max(ratio) by thread This can be added to web.conf if you see consistently webui ratio > 0.8. UI response is sluggish. Metrics - group=dutycycle, name=management, thread=webui, ratio=
			dedicatedIoThreadsSelectionPolicy	round_robin	weighted_random
		[ sslConfig ]	useClientSSLCompression	true	false	SSL compression used by Http client for SH→SH / SH→IDX / SH→CM/ IDX->CM	Change this settings if increase in network bandwidth usage is not an issue.
DC	deploymentclient.conf	[deployment-client]	connect_timeout	60	120	DC->DS	If DCs are timing out while connecting("Connect timeout").
DC	deploymentclient.conf	[deployment-client]	send_timeout	60	120	DC->DS	If there is a send timeout while sending request to DS
DC	deploymentclient.conf	[deployment-client]	recv_timeout	60	120	DC->DS	If DC read timed out while waiting for a response from DS.

View solution in original post

hrawat_splunk · ‎12-22-2022

Explain timeout configs, when and how to use these configs.

Instance	Conf file	Stanza	Config	Default	Max Recommended	Purpose	When to use
Search Head	distsearch.conf	[distributedSearch]	statusTimeout	10 sec	120 sec	Connect/read/write timeout to get search peer's info ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)

			sendTimeout	30 sec	120 sec	send/write timeout for SH search process to peer's main splunkd. One thread handles many peers.	Check search.log for send/write failure. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			receiveTimeout	600 sec	1200 sec	read/receive timeout for SH search process to peer's main splunkd. One thread handles many peers.	Check search.log for read failure.
			connectionTimeout	10 sec	120 sec	connect timeout for SH search process to peer's main splunkd. One thread handles many peers.	Check search.log for connect failure. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			authTokenConnectionTimeout	5 sec	120 sec	connect timeout to get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This can happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			authTokenSendTimeout	10 sec	120 sec	send/write timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running. This ca happen to due to busy mgmt port on peer(check peer metrics.log for dutycycle)
			authTokenReceiveTimeout	10 sec	120 sec	read/receive timeout get search peer's auth token ( from SH main splunkd to peer main splunkd). One thread handles many peers.	Check DistributedPeer* component WARN/ERROR for timeouts. Impacts bundle replication as peers are marked down when peer is up and running.
		[replicationSettings]	connectionTimeout	60 sec	120 sec	connect timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer.	Impacts actual bundle replication. Normally this never fails.
			sendRcvTimeout	60 sec	120 sec	read/write timeout for actual bundle replication from SH main splunkd to peer main splunkd. One thread handles one peer.	Impacts actual bundle replication. Normally this never fails.
	limits.conf	[search]	remote_timeline_connection_timeout	5 sec	30 sec	connect timeout for SH search process to peer's main splunkd for timeliner. Multi-threaded.	Normally this never fails.
			remote_timeline_send_timeout	10 sec	30 sec	send timeout for SH search process to peer's main splunkd for timeliner. Multi-threaded.	Normally this never fails.
	server.conf	[shclustering]	cxn_timeout_raft	2 sec	30 sec	SH to SH raft communication connect timeout. One thread handles one member.
			send_timeout_raft	5 sec	30 sec	SH to SH raft communication send timeout. One thread handles one member.
			rcv_timeout_raft	5 sec	30 sec	SH to SH raft communication read timeout. One thread handles one member.
Indexer	distsearch.conf	[replicationSettings]	sendRcvTimeout	60 sec	120 sec	Read/Write timeout by search peer for bundle replication ( from peer main splunkd to SH main splunkd)	Impacts actual bundle replication.
Indexer/ Search head	server.conf	[httpServer]	busyKeepAliveIdleTimeout	12 sec	120 sec	SH Peer/Member HTTP server keep-alive connection idle timeout.	When number of HTTP connections > 60, it disconnects any idle connection. Impacts bundle replication and search.
			streamInWriteTimeout	5 sec	30 sec	Read/Write timeout when receiving http request by http server	Normally this never fails.
			dedicatedIoThreads	auto	5	Http Listener thread will let additional I/O threads do read/write from socket.	Very useful for knowledge replication/config replication/raft/search artifact replication and searches. You will see in metrics.log that mgmt_httpd is consistently > 0.8 index=_internal source=metrics.log mgmt_httpd \| timechart span=30s max(mgmt_httpd) source=metrics.log thread=dedicated \| timechart span=30s max(ratio) by thread This can be added to web.conf if you see consistently webui ratio > 0.8. UI response is sluggish. Metrics - group=dutycycle, name=management, thread=webui, ratio=
			dedicatedIoThreadsSelectionPolicy	round_robin	weighted_random
		[ sslConfig ]	useClientSSLCompression	true	false	SSL compression used by Http client for SH→SH / SH→IDX / SH→CM/ IDX->CM	Change this settings if increase in network bandwidth usage is not an issue.
DC	deploymentclient.conf	[deployment-client]	connect_timeout	60	120	DC->DS	If DCs are timing out while connecting("Connect timeout").
DC	deploymentclient.conf	[deployment-client]	send_timeout	60	120	DC->DS	If there is a send timeout while sending request to DS
DC	deploymentclient.conf	[deployment-client]	recv_timeout	60	120	DC->DS	If DC read timed out while waiting for a response from DS.

Explanation of various HTTP(s) timeouts

deployer

deployment client

deployment server

distributed search

indexer clustering

search head

search head clustering

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!