Our eight indexers are under enormous stress and I wonder whether the replication factor of three makes any sense. Since it's a major design consideration, what do you think? Is there a way to quantity the risk? to assess the effectiveness of replication factor of 1,2 or 3?
Ok, Hadoop does 3 by default within Hadoop, by it's also being questioned quite a bit...
It is really based on your tolerance for how many indexers you can lose while still maintaining copies of all your data. A Replication Factor of 3 means you can sustain a loss of 2 indexers and still have access to all of your data.
It does become a trade-off between how many indexers you can tolerate losing and the storage costs associated with each additional copy.
It is really based on your tolerance for how many indexers you can lose while still maintaining copies of all your data. A Replication Factor of 3 means you can sustain a loss of 2 indexers and still have access to all of your data.
It does become a trade-off between how many indexers you can tolerate losing and the storage costs associated with each additional copy.
Interesting thing - it seems to me that Replication Factor of 3 puts a huge strain on our system which can cause 2 indexers to go down at the current state. With Replication Factor of 2, we'll probably be very stable for now.
The other aspect is the nature of the data - in our case, not having all the data during a 2 indexers crash, is probably acceptable.
Much appreciated.