What is the reason behind keeping default RF - 3 and SF - 2 ?? why splunk recommad it ?? what happen if we keep RF - 100 ??
Hi
RF means how many copies of bucket splunk is keeping available in cluster nodes. Those copies can be a raw bucket with metadata (searchable) or without metadata (non searchable without rebuilding metadata first). SF means how many searchable bucket the one have in cluster. If bucket is non searchable it cannot use in searches before metadata has build again.
Searchable buckets has also concept Primary and "other". If bucket's status is primary it can used for searches. CM manages this status between those buckets (e.g. when any peers restart, there have created a new bucket etc.). Basically this means that even you have SF=2 you can use only one bucket per time for searches. If SF=1 and that node which has searchable bucket goes down/unavailable status, SHs cannot access that data before another node (after CM told) has rebuild that bucket from replica to searchable (can take some times, depending on your node's HW, storage and bucket sizes). For that reason it's best to have SF at least 2.
In current days when we have enough good HW/OS (don't crash too often) I prefer that SF=2, RF=3 in all reasonable sized clusters. In most cases using bigger values is just wast of storage.
Of course it depends requirements for data availability in searches/alerts. So in some times there could be reasons to keep those bigger than normal.
Then if you are using smartstore then you must keep SF=RF=x.
r. Ismo
https://community.splunk.com/t5/Splunk-Search/Search-factor-vs-Replication-factor/m-p/421544
this discussion above go through that... very useful discussion will solve most of the questions you have.