I have updated a cluster splunk to 6.1.3 version and I have a problem with cluster replication and thawed buckets. After upgrade I restored archived buckets from S3 storage with shuttl. After some days I had need of restart cluster peers. I used "apply rolling restart cluster peers" from the master node. After this operation the cluster never returned complete because of the index with thawed buckets.
In the log of all the cluster peers there are these errors:
08-29-2014 11:57:18.861 +0200 INFO CMReplicationRegistry - Starting replication: bid=kannel~122~F591FF7C-4140-4AC5-BB8F-29122221A60E src=F591FF7C-4140-4AC5-BB8F-29122221A60E target=6FEA8B9A-C28E-44E7-988F-82439A068E84
08-29-2014 11:57:18.861 +0200 WARN DatabaseDirectoryManager - unable to parse bucket type from the pathname='/opt/splunk/var/lib/splunk/kannel/thaweddb/db_1392106608_1391194240_122_F591FF7C-4140-4AC5-BB8F-29122221A60E'
08-29-2014 11:57:18.861 +0200 ERROR BucketReplicator - Unable to parse bucket name for bucketType=/opt/splunk/var/lib/splunk/kannel/thaweddb/db_1392106608_1391194240_122_F591FF7C-4140-4AC5-BB8F-29122221A60E
08-29-2014 11:57:18.862 +0200 INFO CMReplicationRegistry - Finished replication: bid=kannel~122~F591FF7C-4140-4AC5-BB8F-29122221A60E src=F591FF7C-4140-4AC5-BB8F-29122221A60E target=6FEA8B9A-C28E-44E7-988F-82439A068E84
08-29-2014 11:57:18.862 +0200 ERROR ClusterSlaveBucketHandler - Failed to trigger replication (err='Unable to parse bucket name for bucketType=/opt/splunk/var/lib/splunk/kannel/thaweddb/db_1392106608_1391194240_122_F591FF7C-4140-4AC5-BB8F-29122221A60E')
These errors are repeated for each thawed bucket.
It seems that the cluster try to replicate thawed bucket without success.
But, thawed buckets should not be affected by the replication, or am I wrong?
In splunk 5.x there wasn't this problem.
I'm currently Splunk Enterprise 7.0.0 and still experiencing this issue.
Does anyone have a solution yet?
Support told me it's fixed in 7.0.2
yes still this issue at 7.2.6
12-03-2019 17:25:15.702 +0100 ERROR CMRepJob - failed job=CMRepJob guid=81882-0EFE-4404-9000-D99C1B1D078 hp=XXXXXXXX bid=cg_bam_integrations~24~813EFE-4404-9000-D99C1B41D078 tgtGuid=C3F86FE-819C-47DD3D63024F3 tgtHP=XXXXXXX tgtRP= useSSL=false tgt_hp=XXXXXX tgt_guid=C3F86FEC-819C-D63024F3 transErr="No error" peerErr="Failed to trigger replication (err='Unable to parse bucket name for bucket=/opt/spluntegrations/db/hot_4 Can't parse bucket path: /opt/b/splunk/tegrations/db/hot_v1_24')"
We have fixed this with release 6.5.7, 6.6.5 and 7.0.2
Hello,
is this bug fixed in 6.5.2? If yes, why the documentation isn't updated at http://docs.splunk.com/Documentation/Splunk/6.5.2/Indexer/Restorearchiveddata#Clustered_data_thawing
Thanks.
The thawed bucket does not work for replication, Cluster Master does not consider these buckets for replication, but throws series of error messages. Splunk has Bug open to suppress such error message.
Current workarounds is....
1) unthaw as a standalone bucket (follow the standalone bucket naming convention). note: make sure nobody unthaws the same bucket somewhere else - otherwise we may get dupe results.
2) don't use thawdb - unthaw it as a clustered bucket into db/ or colddb/
So Basically recommendation is to allow for thawing just to one location, and then simply not replicate that bucket? So ideally restore thawed bucket in Non cluster indexer and search it from Clustered Search head , by adding this standalone indexer( with thawed buckets
This is still true.
This behavior is due to BUg SPL-90468:Clustering: can't replicate thawed buckets
The workaround will be to not thaw the buckets on Cluster peer but use instance outside of the cluster- as the Search Head can search across both across Cluster and Non-clustered indexers.
Hello,
Is it fixed in V8+? 8.2.2 for instance.
Thanks.
Does this still affect more recent versions of Splunk? Is there a work around that doesn't involve brining new nodes online?
I do not see the release notes for bug fix for this particular bug SPL-90468:Clustering, neither did I find it in Known issues in that release (6.1.3).
So where to find the documentation on this bug?