I have updated a cluster splunk to 6.1.3 version and I have a problem with cluster replication and thawed buckets. After upgrade I restored archived buckets from S3 storage with shuttl. After some days I had need of restart cluster peers. I used "apply rolling restart cluster peers" from the master node. After this operation the cluster never returned complete because of the index with thawed buckets.
In the log of all the cluster peers there are these errors:
08-29-2014 11:57:18.861 +0200 INFO CMReplicationRegistry - Starting replication: bid=kannel~122~F591FF7C-4140-4AC5-BB8F-29122221A60E src=F591FF7C-4140-4AC5-BB8F-29122221A60E target=6FEA8B9A-C28E-44E7-988F-82439A068E84 08-29-2014 11:57:18.861 +0200 WARN DatabaseDirectoryManager - unable to parse bucket type from the pathname='/opt/splunk/var/lib/splunk/kannel/thaweddb/db_1392106608_1391194240_122_F591FF7C-4140-4AC5-BB8F-29122221A60E' 08-29-2014 11:57:18.861 +0200 ERROR BucketReplicator - Unable to parse bucket name for bucketType=/opt/splunk/var/lib/splunk/kannel/thaweddb/db_1392106608_1391194240_122_F591FF7C-4140-4AC5-BB8F-29122221A60E 08-29-2014 11:57:18.862 +0200 INFO CMReplicationRegistry - Finished replication: bid=kannel~122~F591FF7C-4140-4AC5-BB8F-29122221A60E src=F591FF7C-4140-4AC5-BB8F-29122221A60E target=6FEA8B9A-C28E-44E7-988F-82439A068E84 08-29-2014 11:57:18.862 +0200 ERROR ClusterSlaveBucketHandler - Failed to trigger replication (err='Unable to parse bucket name for bucketType=/opt/splunk/var/lib/splunk/kannel/thaweddb/db_1392106608_1391194240_122_F591FF7C-4140-4AC5-BB8F-29122221A60E')
These errors are repeated for each thawed bucket.
It seems that the cluster try to replicate thawed bucket without success.
But, thawed buckets should not be affected by the replication, or am I wrong?
In splunk 5.x there wasn't this problem.
This behavior is due to BUg SPL-90468:Clustering: can't replicate thawed buckets
The workaround will be to not thaw the buckets on Cluster peer but use instance outside of the cluster- as the Search Head can search across both across Cluster and Non-clustered indexers.
Does this still affect more recent versions of Splunk? Is there a work around that doesn't involve brining new nodes online?
I do not see the release notes for bug fix for this particular bug SPL-90468:Clustering, neither did I find it in Known issues in that release (6.1.3).
So where to find the documentation on this bug?
The thawed bucket does not work for replication, Cluster Master does not consider these buckets for replication, but throws series of error messages. Splunk has Bug open to suppress such error message.
Current workarounds is....
1) unthaw as a standalone bucket (follow the standalone bucket naming convention). note: make sure nobody unthaws the same bucket somewhere else - otherwise we may get dupe results.
2) don't use thawdb - unthaw it as a clustered bucket into db/ or colddb/
So Basically recommendation is to allow for thawing just to one location, and then simply not replicate that bucket? So ideally restore thawed bucket in Non cluster indexer and search it from Clustered Search head , by adding this standalone indexer( with thawed buckets
I'm currently Splunk Enterprise 7.0.0 and still experiencing this issue.
Does anyone have a solution yet?