What is Cluster bootstrap process in context of Splunk smartStore? and When to use it?
I have recently migrated from Local storage to smartstore . We have large cluster deployment. We are wondering if we need to perform clustering bootstrap.
Hist to add some more information on bootstrap. For large deployments, customer get concerned while running bootstrap when env have millions of buckets.
As you know, bootstrapping would ensure that buckets which are already present on cluster would not be created again on the cluster.
bootstrapping would just list all the buckets on S3 and would then create the buckets which are not present on the cluster.
It is usually quick as well.
Hence if the customer is only missing few buckets on the cluster, we can initiate bootstrapping and it would create these buckets.
Noe for the Question : List all buckets on s3 for 7 million buckets -> is that still ok / fairly safe / quick?
if we do want to discover these buckets, bootstrapping is the only option currently.
it is not supported per index.
the entire operation is detached from the usual operations of CM - it is safe and quick as well.
The documentation is pretty clear here:
If you find it to be confusing or lacking in some way, then scroll down to the bottom and leave a comment with all the details that you can. Splunk's docs team rocks; they will get in touch with you through your splunk.com email and negotiate a correction so that you get what you need and so will everybody else who comes after you!
Cluster bootstrapping is the process by which a completely new cluster can discover buckets that are present in the external storage system and resume normal operations. This is operation may be useful in the following scenarios: catastrophic failure (whole cluster was burned down) recovery, cluster master + more than repFactor peer failures.
This process has two stages
1. discover all buckets
1. master requests a random Up peer to enumerate the list of the buckets on remote storage for a given index
2. introduce buckets to the cluster master (with repFactor 0)
1. let cluster master use logic to redistribute/rebalance the buckets across the cluster
On the CM you can run the the command to start bootstrap
$SPLUNK_HOME/bin/splunk _internal call /services/cluster/master/control/control/init_recreate_index -method POST