Splunk Enterprise

Error downloading snapshot

dmcnulty
Explorer

After running out of disk space on a search head (part of a cluster), now fixed and all SH's rebooted.

I get this error:

ConfReplicationException Error pulling configurations from the search head cluster captain (SH2:8089); Error in fetchFrom, at=: Non-200 status_code=500: refuse request without valid baseline; snapshot exists at op_id=xxxx6e8e for repo=SH2:8089".  Search head cluster member (SH3:8089) is having trouble pulling configs from the captain (SH2:8089).   xxxxx
Consider performing a destructive configuration resync on this search head cluster member.

 

Ran "splunk resync shcluster-replicated-config"  and get this:

ConfReplicationException : Error downloading snapshot: Non-200 status_code=400: Error opening snapshot_file' /opt/splunk/var/run/snapshot/174xxxxxxxx82aca.bundle: No such file or directory. 

 

In the snapshot folder there is nothing, sometimes a few files, they don't match the other search heads.

'splunk show bundle-replication-status'  is all green and the same as the other 2 SH's.

 

Is there a force resync switch?  Really can't remove this SH and run 'clean all'.

 

Thank you!

 

 

Labels (1)
0 Karma

livehybrid
SplunkTrust
SplunkTrust

Hi @dmcnulty 

The captain is refusing the sync request because the member doesn't have a valid baseline, and the subsequent resync attempt failed because a required snapshot file is missing or inaccessible.

The recommended action is to perform a destructive configuration resync on the affected member (SH3). This forces the member to discard its current replicated configuration and pull a fresh copy from the captain.

Run the following command on the affected search head member (SH3):

splunk resync shcluster-replicated-config --answer-yes
  • This command will discard the contents of $SPLUNK_HOME/etc/shcluster/apps and $SPLUNK_HOME/etc/shcluster/local on SH3 and attempt to fetch a complete, fresh copy from the captain.
  • Ensure the captain (SH2) is healthy and has sufficient disk space and resources before running this command.

If the destructive resync fails with the same or a similar error about a missing snapshot file, it might indicate a more severe issue with the captain's snapshot or the member's ability to process the bundle. If it fails then check the captain's splunkd.log for any specific errors around replication bundles. If the issue persists, removing the member from the cluster and re-adding it is the standard, albeit more disruptive, next step.

 

🌟 Did this answer help you? If so, please consider:

  • Adding karma to show it was useful
  • Marking it as the solution if it resolved your issue
  • Commenting if you need any clarification

Your feedback encourages the volunteers in this community to continue contributing

0 Karma

dmcnulty
Explorer

I did run 'splunk resync shcluster-replicated-config' .  I left it overnight and somehow SH3 sync'd itself.  I also became the captain, which I changed back.  Ran a sync on SH1 and all good now.

No clue how or why it resync'd itself after many failed tries and clean ups.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...

Splunkbase Unveils New App Listing Management Public Preview

Splunkbase Unveils New App Listing Management Public PreviewWe're thrilled to announce the public preview of ...

Leveraging Automated Threat Analysis Across the Splunk Ecosystem

Are you leveraging automation to its fullest potential in your threat detection strategy?Our upcoming Security ...