Solved: Replication Factor with N+1 indexer

anthonypradal · ‎08-09-2018

Hello,

I would like to know what was the workflow of the current situation.

We have setup the replication factor number to 3 and deployed a cluster of 5 indexers. Where are stored the data ? Is like the process of a RAID 5 or something else.

Could i lose 2 servers and still guarantee our data integrity ?

Thank you

sudosplunk · ‎08-09-2018

The data is stored across your cluster randomly.

Let me start with some basic definitions:

Source node: The source node ingests data from forwarders or other external sources.
Target node: The target node receive streams of replicated data from the source nodes.

With respect to storing replicated data, you cannot currently specify which nodes will receive replicated data. The master determines that on a bucket-by-bucket basis, and the behavior is not configurable. You must assume that all the peer nodes will serve as targets.

At any given time, each source peer would be streaming copies of its data to two target peers, but each time it started a new hot bucket, its set of target peers could potentially change.

Could I lose 2 servers and still guarantee our data integrity ?

Short answer is Yes.

The cluster can tolerate a failure of (replication factor - 1) peer nodes. For example, a replication factor of 3 means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of data available on the remaining peers.

Bonus:

With a search factor of at least 2, the cluster is able to continue searching with little interruption if a peer node goes down. For example, say you specify a replication factor of 3 and a search factor of 2. The cluster will maintain three copies of all buckets on separate peers across the cluster, and two copies of each bucket will be searchable. Then, if a peer goes down and it contains a bucket copy that has been participating in searches, a searchable copy of that bucket on another peer can immediately step in and start participating in searches.

On the other hand, if the cluster's search factor is only 1 and a peer goes down, there will be a significant lag before searching can resume across the full set of cluster data.

Hope this helps!

You can find more information here.

View solution in original post

sudosplunk · ‎08-09-2018

The data is stored across your cluster randomly.

Let me start with some basic definitions:

Source node: The source node ingests data from forwarders or other external sources.
Target node: The target node receive streams of replicated data from the source nodes.

With respect to storing replicated data, you cannot currently specify which nodes will receive replicated data. The master determines that on a bucket-by-bucket basis, and the behavior is not configurable. You must assume that all the peer nodes will serve as targets.

At any given time, each source peer would be streaming copies of its data to two target peers, but each time it started a new hot bucket, its set of target peers could potentially change.

Could I lose 2 servers and still guarantee our data integrity ?

Short answer is Yes.

The cluster can tolerate a failure of (replication factor - 1) peer nodes. For example, a replication factor of 3 means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of data available on the remaining peers.

Bonus:

With a search factor of at least 2, the cluster is able to continue searching with little interruption if a peer node goes down. For example, say you specify a replication factor of 3 and a search factor of 2. The cluster will maintain three copies of all buckets on separate peers across the cluster, and two copies of each bucket will be searchable. Then, if a peer goes down and it contains a bucket copy that has been participating in searches, a searchable copy of that bucket on another peer can immediately step in and start participating in searches.

On the other hand, if the cluster's search factor is only 1 and a peer goes down, there will be a significant lag before searching can resume across the full set of cluster data.

Hope this helps!

You can find more information here.

Replication Factor with N+1 indexer

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers

Are you a member of the Splunk Community?

Replication Factor with N+1 indexer

Index This | Why did the turkey cross the road?

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Feel the Splunk Love: Real Stories from Real Customers