Deployment Architecture

Hard disk Failure on One Index in a Cluster

mark_wymer
Path Finder

Hi all,

Our environment consists of, amongst other things, a multisite (3) clustered environment. Each site has three indexers making a total of nine indexers. We also have a replication factor of 3. On each indexer the hot/warm and cold buckets are on separate filesystems.

On one of the indexers, the filesystem containing the cold buckets suffered a hard disk failure which has destroyed the entire FS.

My question is: when the disk/filesystem is repaired, will Splunk automatically rebuild the cold buckets from the replications? If it does, will it do it when I start Splunk or is there some maintenance commands that I will need to issue?

Many thanks,
Mark.

0 Karma
1 Solution

nickhills
Ultra Champion

Hi Mark,

Once the file system is back (assuming its just the index filesystem) and you can boot the peer as normal, it should rejoin the cluster.
When it joins, it will share its list of cold buckets (none) with the CM.
The CM will take any steps necessary to bring the cluster back into health, however if the cluster has already become consistent (using the remaining 8 hosts) there will not be anything needed to be replicated.

This will mean that your restored peer will initially have very few (none) cold buckets. This is fine from a cluster health perspective, but it does mean that indexer will not "pull its weight" for searches that include data in those cold buckets.

To restore even distribution of buckets across all peers (recommended for optimum performance and tolerance) you should do a rebalance on the cluster which will copy buckets to that host from the surviving 8 peers.

https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Rebalancethecluster

If my comment helps, please give it a thumbs up!

View solution in original post

nickhills
Ultra Champion

Hi Mark,

Once the file system is back (assuming its just the index filesystem) and you can boot the peer as normal, it should rejoin the cluster.
When it joins, it will share its list of cold buckets (none) with the CM.
The CM will take any steps necessary to bring the cluster back into health, however if the cluster has already become consistent (using the remaining 8 hosts) there will not be anything needed to be replicated.

This will mean that your restored peer will initially have very few (none) cold buckets. This is fine from a cluster health perspective, but it does mean that indexer will not "pull its weight" for searches that include data in those cold buckets.

To restore even distribution of buckets across all peers (recommended for optimum performance and tolerance) you should do a rebalance on the cluster which will copy buckets to that host from the surviving 8 peers.

https://docs.splunk.com/Documentation/Splunk/8.0.1/Indexer/Rebalancethecluster

If my comment helps, please give it a thumbs up!

mark_wymer
Path Finder

Thanks for the answer / confirmation Nick.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...