Can anyone help why we are seeing these WARN in logs and how to fix permanently.
We are performing manual resync whenever count of events is > 5 in 15mins time range using below query:
index=_internal host=searchhead* component=ConfReplicationThread log_level=WARN "Cannot accept push" |bin span=15m _time|stats max(consecutiveErrors) as count by host,_time|where count>5
LOGS:
==========
WARN ConfReplicationThread - Error pushing configurations to captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=8d89fca5ef4520b00b8ffe8b1366a178b92b52fb; current_baseline_op_id=a948f0e3f0fcae707ce37ca7d7a73"
ConfReplicationThread - Error pushing configurations to captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in acceptPush: Non-200 status_code=400: ConfReplicationException: Cannot accept push with outdated_baseline_op_id=66098bdc22c2bcacf951fb104558db365ac64820; current_baseline_op_id=085e675e4c9d8c9fafabee"
ConfReplicationThread - Error pulling configurations from captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in fetchFrom, at=a6a747e7138353bd07873f04fe90f2c9b4564567: Network-layer error: Connect Timeout"
ConfReplicationThread - Error pulling configurations from captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in fetchFrom, at=a6a747e7138353bd07873f04fe90f2c9b4564567: Network-layer error: Connect Timeout"
ConfReplicationThread - Error pulling configurations from captain=https://searchhead01.domain.com:8089, consecutiveErrors=1 msg="Error in fetchFrom, at=a6a747e7138353bd07873f04fe90f2c9b4564567: Network-layer error: Connect Timeout"
=============
Even tried to reduce the max_push count to 50 (default is 100). How can we resolve this permanently ?
==============
WARN ConfMetrics - single_action=PUSH_TO took wallclock_ms=1525! Consider a lower value of conf_replication_max_push_count in server.conf on all members.
WARN ConfMetrics - single_action=PUSH_TO took wallclock_ms=2644! Consider a lower value of conf_replication_max_push_count in server.conf on all members.
WARN ConfMetrics - single_action=PULL_FROM took wallclock_ms=2011! Consider a lower value of conf_replication_max_pull_count in server.conf on all members.
WARN ConfMetrics - single_action=PULL_FROM took wallclock_ms=1778! Consider a lower value of conf_replication_max_pull_count in server.conf on all members.
Below are the settings in server.conf
conf_replication_max_push_count = 50
conf_replication_purge.period = 3h
conf_replication_period = 10
we do not want to do resync everytime.
splunk resync shcluster-replicated-config
Based on warn one of your search member is in out of sync. The search member is trying to update search head captain about the change made on search member.
Did you try initiating search member rolling restart?
if not, try restarting your search head cluster.
Yes, Search Head restart and Resyn manually has been done already.
Still seeing same repetative warnings