After running out of disk space on a search head (part of a cluster), now fixed and all SH's rebooted.
I get this error:
ConfReplicationException Error pulling configurations from the search head cluster captain (SH2:8089); Error in fetchFrom, at=: Non-200 status_code=500: refuse request without valid baseline; snapshot exists at op_id=xxxx6e8e for repo=SH2:8089". Search head cluster member (SH3:8089) is having trouble pulling configs from the captain (SH2:8089). xxxxx
Consider performing a destructive configuration resync on this search head cluster member.
Ran "splunk resync shcluster-replicated-config" and get this:
ConfReplicationException : Error downloading snapshot: Non-200 status_code=400: Error opening snapshot_file' /opt/splunk/var/run/snapshot/174xxxxxxxx82aca.bundle: No such file or directory.
In the snapshot folder there is nothing, sometimes a few files, they don't match the other search heads.
'splunk show bundle-replication-status' is all green and the same as the other 2 SH's.
Is there a force resync switch? Really can't remove this SH and run 'clean all'.
Thank you!
Hi @dmcnulty
The captain is refusing the sync request because the member doesn't have a valid baseline, and the subsequent resync attempt failed because a required snapshot file is missing or inaccessible.
The recommended action is to perform a destructive configuration resync on the affected member (SH3). This forces the member to discard its current replicated configuration and pull a fresh copy from the captain.
Run the following command on the affected search head member (SH3):
splunk resync shcluster-replicated-config --answer-yes
If the destructive resync fails with the same or a similar error about a missing snapshot file, it might indicate a more severe issue with the captain's snapshot or the member's ability to process the bundle. If it fails then check the captain's splunkd.log for any specific errors around replication bundles. If the issue persists, removing the member from the cluster and re-adding it is the standard, albeit more disruptive, next step.
🌟 Did this answer help you? If so, please consider:
Your feedback encourages the volunteers in this community to continue contributing
I did run 'splunk resync shcluster-replicated-config' . I left it overnight and somehow SH3 sync'd itself. I also became the captain, which I changed back. Ran a sync on SH1 and all good now.
No clue how or why it resync'd itself after many failed tries and clean ups.