i know that setting RF=2 ensures 2 copies of buckets on available indexers. so this consume 2X times of space/disk.
now i also know that only the primary copy is searchable, ie SF=1 , is this the default setting for SF ./
question : if i change my SF=2 , does this mean 2 copies are changed to primary ? so 2 copies are searchable ?
does SF increase the space requirement when changed from 1 to 2 ?
is this increase in space same as RF, ie double the space or some percentage of it...
Hi @vishaltaneja07011993 ,
Let's understand search factor and replication factor.
Replication Factor - Number of copies of buckets.
Search Factor - Number of searchable copies of buckets.
RF=2 & SF=1 consume 2X times of space/disk - Wrong!!! It takes somewhat lesser space. If RF=2 and SF=2 then it will take exact 2X disc space. Searchable buckets contains TSIDX and bloom filter apart from raw data. Hope based on that you can understand the space requirement.
Coming to primary buckets, primary buckets will always only one. It tells splunk which are the buckets to search. If any search peer goes down splunk will find other searchable buckets and make is primary if not found it make non-searchable bucket searchable and then make it primary.
Hope you understand the difference between RF and SF. And also importance of primary buckets.
Hi @vishaltaneja07011993 ,
Let's understand search factor and replication factor.
Replication Factor - Number of copies of buckets.
Search Factor - Number of searchable copies of buckets.
RF=2 & SF=1 consume 2X times of space/disk - Wrong!!! It takes somewhat lesser space. If RF=2 and SF=2 then it will take exact 2X disc space. Searchable buckets contains TSIDX and bloom filter apart from raw data. Hope based on that you can understand the space requirement.
Coming to primary buckets, primary buckets will always only one. It tells splunk which are the buckets to search. If any search peer goes down splunk will find other searchable buckets and make is primary if not found it make non-searchable bucket searchable and then make it primary.
Hope you understand the difference between RF and SF. And also importance of primary buckets.
Hi Vatsal, thanks for your reply.
im trying to understand this with example below.
if bucket size is 100GB, then RF=2, will result in 200GB, right.
if SF=1, then this includes index+rawdata+bloom filter = > 200GB, which is more than 2X times. right..
then if SF=2, then it will be 2X times of bucket+searchable.
am i understanding this right..
i understand about primary buckets. so i can have a primary for each site, in case of multi site ...
Correct @jiagya and @vishaltaneja07011993,
Bucket can be searchable and non-searchable.
Non-searchable = raw-data
Searchable = raw-data + tsidx + bloomfilter
RF = SF (searchable buckets) + Non-searchable buckets
Hey @VatsalJagani
One doubt,
For example- SF=2 & RF=2 are set.
If the searchable bucket copy is present in the multisite(2 sites here) index cluster.
i.e
Site1 = searchable bucket
Site2 = searchable bucket
Does both searchable bucket act as primary search copy in the indexer cluster?
@arunsunny - Regarding Search Factor & Replication Factor in multi-site clustering.
The multisite cluster uses different parameters with similar meanings:
Explanation:
@VatsalJagani - So from your statement "A primary copy of the bucket: There will be always one primary copy of the bucket, regardless of whether it's a single-site or multi-site cluster" - you are saying always one primary copy in the cluster.
But if you look at the below lines in the doc, we can keep more than one primary copy on different sites. https://docs.splunk.com/Documentation/Splunk/9.0.2/Indexer/Multisitesearchaffinity
"There are also ways to configure the site search factor to ensure that all sites have searchable copies, even without explicitly specifying some or all of them. For example, a three-site cluster with site_search_factor = origin:1, total:3 guarantees one searchable copy per site, and thus enables search affinity for each site. Each site will have primary copies of all buckets."
Let me know your thoughts on this.
Happy Splunking.
@arunsunny - Here is my understanding about a primary copy of a bucket from Consultant Training.
I hope this helps!!!
correct SF=2 means 2X time of raw + index data files.
Thanks, thats what i wanted to know.
@jiaqya
Yes correct it will increase the disk space as well. As RF only allow to store raw data, in case of SF indexed data copies will also be there. So it will require more space.
Please find the below doc for better understanding:
https://docs.splunk.com/Documentation/Splunk/7.2.6/Indexer/Bucketsandclusters