Splunk Enterprise

Error downloading snapshot

dmcnulty
Explorer

After running out of disk space on a search head (part of a cluster), now fixed and all SH's rebooted.

I get this error:

ConfReplicationException Error pulling configurations from the search head cluster captain (SH2:8089); Error in fetchFrom, at=: Non-200 status_code=500: refuse request without valid baseline; snapshot exists at op_id=xxxx6e8e for repo=SH2:8089".  Search head cluster member (SH3:8089) is having trouble pulling configs from the captain (SH2:8089).   xxxxx
Consider performing a destructive configuration resync on this search head cluster member.

 

Ran "splunk resync shcluster-replicated-config"  and get this:

ConfReplicationException : Error downloading snapshot: Non-200 status_code=400: Error opening snapshot_file' /opt/splunk/var/run/snapshot/174xxxxxxxx82aca.bundle: No such file or directory. 

 

In the snapshot folder there is nothing, sometimes a few files, they don't match the other search heads.

'splunk show bundle-replication-status'  is all green and the same as the other 2 SH's.

 

Is there a force resync switch?  Really can't remove this SH and run 'clean all'.

 

Thank you!

 

 

Labels (1)
0 Karma

livehybrid
Super Champion

Hi @dmcnulty 

The captain is refusing the sync request because the member doesn't have a valid baseline, and the subsequent resync attempt failed because a required snapshot file is missing or inaccessible.

The recommended action is to perform a destructive configuration resync on the affected member (SH3). This forces the member to discard its current replicated configuration and pull a fresh copy from the captain.

Run the following command on the affected search head member (SH3):

splunk resync shcluster-replicated-config --answer-yes
  • This command will discard the contents of $SPLUNK_HOME/etc/shcluster/apps and $SPLUNK_HOME/etc/shcluster/local on SH3 and attempt to fetch a complete, fresh copy from the captain.
  • Ensure the captain (SH2) is healthy and has sufficient disk space and resources before running this command.

If the destructive resync fails with the same or a similar error about a missing snapshot file, it might indicate a more severe issue with the captain's snapshot or the member's ability to process the bundle. If it fails then check the captain's splunkd.log for any specific errors around replication bundles. If the issue persists, removing the member from the cluster and re-adding it is the standard, albeit more disruptive, next step.

 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

dmcnulty
Explorer

I did run 'splunk resync shcluster-replicated-config' .  I left it overnight and somehow SH3 sync'd itself.  I also became the captain, which I changed back.  Ran a sync on SH1 and all good now.

No clue how or why it resync'd itself after many failed tries and clean ups.

0 Karma
Get Updates on the Splunk Community!

Splunk Observability Cloud's AI Assistant in Action Series: Auditing Compliance and ...

This is the third post in the Splunk Observability Cloud’s AI Assistant in Action series that digs into how to ...

Splunk Community Badges!

  Hey everyone! Ready to earn some serious bragging rights in the community? Along with our existing badges ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...