Solved: Why am I getting "ConfReplicationThread - Error pu...

bgaignon · ‎11-15-2014

Hi guys,

I have an issue with my Search Head cluster, the replication seems to not be working:

192.128.192.131 is the SearchHead1

192.128.192.136 is the Searchhead2

11-15-2014 12:42:32.993 +0100 WARN  ConfReplicationThread - Error pulling configurations from captain=https://192.168.192.131:8089, consecutiveErrors=966: Error in fetchFrom, at=: Non-200 status_code=500: refuse request without valid baseline; snapshot exists at op_id=1a4a26781bed0c57c325b1fd297fb07082eba435 for repo=https://192.168.192.131:8089
11-15-2014 12:42:32.990 +0100 ERROR HttpListener - Exception while processing request from 192.168.192.136 for /services/replication/configuration/commits?output_mode=json&at=: refuse request without valid baseline; snapshot exists at op_id=1a4a26781bed0c57c325b1fd297fb07082eba435 for repo=https://192.168.192.131:8089

The captain feature is working, if i stop the captain the other Search Head becomes the captain (according the command "splunk show shcluster-status").

Here is my server.conf on Search Heads:

[shclustering]
conf_deploy_fetch_url = https://192.168.192.134:8089 # DEPLOYER URL
disabled = 0
mgmt_uri = https://192.168.192.136:8089 # IP OF CURRENT SERVER
pass4SymmKey = $1$oov1Lgj65W5z
replication_factor = 2
id = 6EFA87CF-8D4D-43D5-85D3-DE8BAD78403E

Does someone see where is my problem ??

bgaignon · ‎11-18-2014

Found the problem,

http://docs.splunk.com/Documentation/Splunk/6.2.0/DistSearch/Handlememberfailure

Splunk resync shcluster-replicated-config

View solution in original post

rbal_splunk · ‎08-24-2017

some further update for errors like below

08-01-2017 10:03:37.694 -0700 WARN ConfReplicationThread - Error pulling configurations from captain=https://:8089, consecutiveErrors=2 msg="Error in fetchFrom, at=ae823222d0607652969d338bb793469fb7de85cd: Network-layer error: Connect Timeout

Please not that consecutiveErrors is not larger than 10 is not considered as a real issue. It can be due to the captain side is busy and not be able to response in time.

Check what is the consecutiveErrors count for you using search like

Index=_internal ( host= OR host= OR host= OR host=) source="splunkd.log" "ConfReplicationThread - Error pulling configurations from captain" | stats max(consecutiveErrors) by host

It's not an issue is the consecutiveErrors<10. In case error is above 10 log case with Splunk Support

rbal_splunk · ‎06-28-2015

Normally this error means that Serahc Head Cluster member is fallen behind in replication - I think it may be good idea to debug why configurations aren't sync-ing in the first place and address the root cause.

A destructive resync is only truly required if the member has fallen really far behind the captain -- i.e. 20000 changes behind (by default) -- or if local state is completely corrupted/invalid (e.g. corrupt filesystem).

For Search Head cluster please refer answers below to ensure that Search Head Cluster members are configured as per requirement.

http://answers.splunk.com/answers/242905/shc-troubleshooting-configurations-under-search-he.html#ans...

bgaignon · ‎11-18-2014

Found the problem,

http://docs.splunk.com/Documentation/Splunk/6.2.0/DistSearch/Handlememberfailure

Splunk resync shcluster-replicated-config

bohanlon_splunk · ‎01-16-2017

I downvoted this post because this doesn't fix the underlaying issue (i.e. identify the cause of the replication bottleneck). this just temporarily works around it.

rmorlen · ‎03-10-2015

Ok. But in the docs it states:

"Caution: This command causes an overwrite of the member's entire set of search-related configurations, resulting in the loss of any local changes."

What does "loss of local changes" mean with this? Any changes that have been made are lost? For all time? For the last hour?

Steve_G_ · ‎03-10-2015

It means any changes that you have made to that search head alone, as opposed to those changes that get propagated (through either the deployer or automatic replication) across the set of cluster members.

bgaignon · ‎03-10-2015

I guess you loose all changes since the last replication.
Without replication to other Search Head members, your changes are local.
This is how I understand that.

Why am I getting "ConfReplicationThread - Error pulling configurations from captain" in my search head cluster?

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

Monitoring Amazon Elastic Kubernetes Service (EKS)