Solved: Search factor vs Replication factor-If I change my...

jiaqya · ‎04-22-2019

i know that setting RF=2 ensures 2 copies of buckets on available indexers. so this consume 2X times of space/disk.
now i also know that only the primary copy is searchable, ie SF=1 , is this the default setting for SF ./

question : if i change my SF=2 , does this mean 2 copies are changed to primary ? so 2 copies are searchable ?
does SF increase the space requirement when changed from 1 to 2 ?

is this increase in space same as RF, ie double the space or some percentage of it...

VatsalJagani · ‎04-22-2019

Hi @vishaltaneja07011993 ,

Let's understand search factor and replication factor.
Replication Factor - Number of copies of buckets.
Search Factor - Number of searchable copies of buckets.

RF=2 & SF=1 consume 2X times of space/disk - Wrong!!! It takes somewhat lesser space. If RF=2 and SF=2 then it will take exact 2X disc space. Searchable buckets contains TSIDX and bloom filter apart from raw data. Hope based on that you can understand the space requirement.

Coming to primary buckets, primary buckets will always only one. It tells splunk which are the buckets to search. If any search peer goes down splunk will find other searchable buckets and make is primary if not found it make non-searchable bucket searchable and then make it primary.

Hope you understand the difference between RF and SF. And also importance of primary buckets.

View solution in original post

VatsalJagani · ‎04-22-2019

Hi @vishaltaneja07011993 ,

Let's understand search factor and replication factor.
Replication Factor - Number of copies of buckets.
Search Factor - Number of searchable copies of buckets.

RF=2 & SF=1 consume 2X times of space/disk - Wrong!!! It takes somewhat lesser space. If RF=2 and SF=2 then it will take exact 2X disc space. Searchable buckets contains TSIDX and bloom filter apart from raw data. Hope based on that you can understand the space requirement.

Coming to primary buckets, primary buckets will always only one. It tells splunk which are the buckets to search. If any search peer goes down splunk will find other searchable buckets and make is primary if not found it make non-searchable bucket searchable and then make it primary.

Hope you understand the difference between RF and SF. And also importance of primary buckets.

jiaqya · ‎04-22-2019

Hi Vatsal, thanks for your reply.

im trying to understand this with example below.

if bucket size is 100GB, then RF=2, will result in 200GB, right.
if SF=1, then this includes index+rawdata+bloom filter = > 200GB, which is more than 2X times. right..
then if SF=2, then it will be 2X times of bucket+searchable.

am i understanding this right..

i understand about primary buckets. so i can have a primary for each site, in case of multi site ...

VatsalJagani · ‎04-23-2019

Correct @jiagya and @vishaltaneja07011993,

Bucket can be searchable and non-searchable.
Non-searchable = raw-data
Searchable = raw-data + tsidx + bloomfilter

RF = SF (searchable buckets) + Non-searchable buckets

arunsunny · ‎11-07-2022

Hey @VatsalJagani

One doubt,

For example- SF=2 & RF=2 are set.

If the searchable bucket copy is present in the multisite(2 sites here) index cluster.

i.e
Site1 = searchable bucket
Site2 = searchable bucket

Does both searchable bucket act as primary search copy in the indexer cluster?

VatsalJagani · ‎11-07-2022

@arunsunny - Regarding Search Factor & Replication Factor in multi-site clustering.

The multisite cluster uses different parameters with similar meanings:

site_replication_factor = origin:2, total:3
site_search_factor = origin: 1, total: 2

Explanation:

Here there will be a total of 3 copies of _raw data (replication_factor) will be there. Two (2) of them will be stored on the site where the data originated.
There will be a total of 2 searchable copies of the bucket (_raw data + bloom filter and other metadata files) will be there. One (1) of the buckets will be ensured to store on the site where the bucket was originally generated.
A primary copy of the bucket: There will be always one primary copy of the bucket, regardless of whether it's a single-site or multi-site cluster.
- This will be switched very quickly by the cluster master if the server holding the primary copy goes down.
- The primary copy can be on a different site than where your search head is located. Cluster Master determines that based on search affinity and other factors.
  - Cluster Master tries to keep it on the cluster which gives better performance, but no guarantee to be on the same site all the time.

arunsunny · ‎11-07-2022

@VatsalJagani - So from your statement "A primary copy of the bucket: There will be always one primary copy of the bucket, regardless of whether it's a single-site or multi-site cluster" - you are saying always one primary copy in the cluster.

But if you look at the below lines in the doc, we can keep more than one primary copy on different sites. https://docs.splunk.com/Documentation/Splunk/9.0.2/Indexer/Multisitesearchaffinity

"There are also ways to configure the site search factor to ensure that all sites have searchable copies, even without explicitly specifying some or all of them. For example, a three-site cluster with site_search_factor = origin:1, total:3 guarantees one searchable copy per site, and thus enables search affinity for each site. Each site will have primary copies of all buckets."

Let me know your thoughts on this.

Happy Splunking.

VatsalJagani · ‎11-08-2022

@arunsunny - Here is my understanding about a primary copy of a bucket from Consultant Training.

The search head always sends search requests to all the indexers (peers) to all the sites regardless of single-site or multi-site, regardless of affinity enabled or not.
The primary bucket is how Splunk decides which bucket needs to be searched.
If there is more than one primary bucket means duplicate data in the search results.
There could be multiple searchable copies of a bucket but there will be only one primary bucket. That's how Splunk ensures no duplicate data in the results.

I hope this helps!!!

vishaltaneja070 · ‎04-23-2019

correct SF=2 means 2X time of raw + index data files.

jiaqya · ‎04-23-2019

Thanks, thats what i wanted to know.

vishaltaneja070 · ‎04-22-2019

@jiaqya

Yes correct it will increase the disk space as well. As RF only allow to store raw data, in case of SF indexed data copies will also be there. So it will require more space.

Please find the below doc for better understanding:
https://docs.splunk.com/Documentation/Splunk/7.2.6/Indexer/Bucketsandclusters

Search factor vs Replication factor-If I change my SF=2 , does this mean 2 copies are changed to primary?

other

New Case Study Shows the Value of Partnering with Splunk Academic Alliance

How to Monitor Google Kubernetes Engine (GKE)

Index This | How can you make 45 using only 4?