Splunk Enterprise

ConfReplicationThread & ConfMetrics WARN on SearchHeads

venkateshparank
Path Finder

Can anyone help why we are seeing these WARN in logs and how to fix permanently.

We are performing manual resync whenever count of events is > 5 in 15mins time range using below query:

index=_internal host=searchhead* component=ConfReplicationThread log_level=WARN "Cannot accept push" |bin span=15m _time|stats max(consecutiveErrors) as count by host,_time|where count>5

LOGS:

==========

WARN ConfReplicationThread - Error pushing configurations to captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=8d89fca5ef4520b00b8ffe8b1366a178b92b52fb; current_baseline_op_id=a948f0e3f0fcae707ce37ca7d7a73"

ConfReplicationThread - Error pushing configurations to captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=66098bdc22c2bcacf951fb104558db365ac64820; current_baseline_op_id=085e675e4c9d8c9fafabee"

ConfReplicationThread - Error pulling configurations from captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in fetchFrom, at=a6a747e7138353bd07873f04fe90f2c9b4564567: Network-layer error: Connect Timeout"

ConfReplicationThread - Error pulling configurations from captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in fetchFrom, at=a6a747e7138353bd07873f04fe90f2c9b4564567: Network-layer error: Connect Timeout"

ConfReplicationThread - Error pulling configurations from captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in fetchFrom, at=a6a747e7138353bd07873f04fe90f2c9b4564567: Network-layer error: Connect Timeout"

=============

Even tried to reduce the max_push count to 50 (default is 100). How can we resolve this permanently ?

==============

WARN ConfMetrics - single_action=PUSH_TO took wallclock_ms=1525! Consider a lower value of conf_replication_max_push_count in server.conf on all members.
WARN ConfMetrics - single_action=PUSH_TO took wallclock_ms=2644! Consider a lower value of conf_replication_max_push_count in server.conf on all members.
WARN ConfMetrics - single_action=PULL_FROM took wallclock_ms=2011! Consider a lower value of conf_replication_max_pull_count in server.conf on all members.
WARN ConfMetrics - single_action=PULL_FROM took wallclock_ms=1778! Consider a lower value of conf_replication_max_pull_count in server.conf on all members.

 

Below are the settings in server.conf

conf_replication_max_push_count = 50
conf_replication_purge.period = 3h
conf_replication_period = 10

 

we do not want to do resync everytime.

splunk resync shcluster-replicated-config

Labels (3)
0 Karma

thambisetty
Super Champion

Based on warn one of your search member is in out of sync. The search member is trying to update search head captain about the change made on search member.

Did you try initiating search member rolling restart?

if not, try restarting your search head cluster.

————————————
If this helps, give a like below.
0 Karma

venkateshparank
Path Finder

Yes, Search Head restart and Resyn manually has been done already.

Still seeing same repetative warnings

0 Karma
Did you miss .conf21 Virtual?

Good news! The event's keynotes and many of its breakout sessions are now available online, and still totally FREE!