What we did was : Restored 2 old peer nodes from a backup Cloned the master node to setup a shadow cluster and adapted the replication-factor on this clone to 2. This allowed us to make a mini-cluster which is fully balanced (so both restored peer nodes would have all data) I did however noticed that on one of the two recoved nodes the colddb-location remained empty. Placed the shadow-cluster in maintenance and removed one of the peer nodes. Reconfigured this peer to connect to the production cluster Also changed the name in the server.conf and removed the instance.cfg to prevent duplicate peer names and UUID's When I check the "Settings / Indexer Clustering" page on the master it does show the recovered node as well. The "Indexes" tab on this same page shows all indexes are green. But... when I do a search for the earliestTime, the older data which is on the recovered peer is not seen. Only when I add the recovered peer to the distSearch.conf it does see the older events. Also when I remove the recovered peer again from the cluster the older events are also gone again, which indicates those cold buckets were not synced to the production nodes. The buckets are not rolled to frozen, because the frozenTimePeriodInSecs for the index is set to 157248000 (about 5 years) and the data I try to recover is from 2020. And I did just run a dbinspect and it seems not to give any errors on the cold buckets on the restored host. Path is the colddb-path and state is 'cold' as expected Eventually I would like to remove the recovered peer again from the cluster, since this is still running RHEL7 and it has to be switched off.. So... I am looking for a way to safely get the data on the RHEL9 nodes. And as a side-track I want to get the understanding of how the warm/cold buckets are handled. Because... when they are indeed not replicated it also explains why they were lost in the first place... the RHEL9 nodes were clean installations which replaced the RHEL7 nodes. The rough procedure followed in this migration was : Add an additional "overflow" peer to the cluster and make sure the cluster is synced. Bring down (offline --enforce-counts) one of the RHEL7 nodes and replace it with a clean RHEL9 node. Config from /opt/splunk/etc was taken over from the old RHEL7 node When all nodes were replace, the "overflow" node was removed. So, when cold buckets were not replicated, the were never replicated to the overflow node and eventually were all gone..
... View more