Splunk Search

Search factor vs Replication factor-If I change my SF=2 , does this mean 2 copies are changed to primary?

jiaqya
Builder

i know that setting RF=2 ensures 2 copies of buckets on available indexers. so this consume 2X times of space/disk.
now i also know that only the primary copy is searchable, ie SF=1 , is this the default setting for SF ./

question : if i change my SF=2 , does this mean 2 copies are changed to primary ? so 2 copies are searchable ?
does SF increase the space requirement when changed from 1 to 2 ?

is this increase in space same as RF, ie double the space or some percentage of it...

Labels (1)
Tags (1)
1 Solution

VatsalJagani
SplunkTrust
SplunkTrust

Hi @vishaltaneja07011993 ,

Let's understand search factor and replication factor.
Replication Factor - Number of copies of buckets.
Search Factor - Number of searchable copies of buckets.

RF=2 & SF=1 consume 2X times of space/disk - Wrong!!! It takes somewhat lesser space. If RF=2 and SF=2 then it will take exact 2X disc space. Searchable buckets contains TSIDX and bloom filter apart from raw data. Hope based on that you can understand the space requirement.

Coming to primary buckets, primary buckets will always only one. It tells splunk which are the buckets to search. If any search peer goes down splunk will find other searchable buckets and make is primary if not found it make non-searchable bucket searchable and then make it primary.

Hope you understand the difference between RF and SF. And also importance of primary buckets.

View solution in original post

VatsalJagani
SplunkTrust
SplunkTrust

Hi @vishaltaneja07011993 ,

Let's understand search factor and replication factor.
Replication Factor - Number of copies of buckets.
Search Factor - Number of searchable copies of buckets.

RF=2 & SF=1 consume 2X times of space/disk - Wrong!!! It takes somewhat lesser space. If RF=2 and SF=2 then it will take exact 2X disc space. Searchable buckets contains TSIDX and bloom filter apart from raw data. Hope based on that you can understand the space requirement.

Coming to primary buckets, primary buckets will always only one. It tells splunk which are the buckets to search. If any search peer goes down splunk will find other searchable buckets and make is primary if not found it make non-searchable bucket searchable and then make it primary.

Hope you understand the difference between RF and SF. And also importance of primary buckets.

jiaqya
Builder

Hi Vatsal, thanks for your reply.

im trying to understand this with example below.

if bucket size is 100GB, then RF=2, will result in 200GB, right.
if SF=1, then this includes index+rawdata+bloom filter = > 200GB, which is more than 2X times. right..
then if SF=2, then it will be 2X times of bucket+searchable.

am i understanding this right..

i understand about primary buckets. so i can have a primary for each site, in case of multi site ...

VatsalJagani
SplunkTrust
SplunkTrust

Correct @jiagya and @vishaltaneja07011993,

Bucket can be searchable and non-searchable.
Non-searchable = raw-data
Searchable = raw-data + tsidx + bloomfilter

RF = SF (searchable buckets) + Non-searchable buckets

0 Karma

arunsunny
Path Finder

Hey @VatsalJagani  

One doubt,

For example- SF=2 & RF=2 are set.

If the searchable bucket copy is present in the multisite(2 sites here) index cluster.

i.e
Site1 = 
searchable bucket 
Site2 = searchable bucket 

Does both searchable bucket act as primary search copy in the indexer cluster?

 

0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@arunsunny - Regarding Search Factor & Replication Factor in multi-site clustering.

The multisite cluster uses different parameters with similar meanings:

  • site_replication_factor = origin:2, total:3
  • site_search_factor = origin: 1, total: 2

 

Explanation:

  • Here there will be a total of 3 copies of _raw data (replication_factor) will be there. Two (2) of them will be stored on the site where the data originated.
  • There will be a total of 2 searchable copies of the bucket (_raw data + bloom filter and other metadata files) will be there. One (1) of the buckets will be ensured to store on the site where the bucket was originally generated.
  • A primary copy of the bucket: There will be always one primary copy of the bucket, regardless of whether it's a single-site or multi-site cluster.
    • This will be switched very quickly by the cluster master if the server holding the primary copy goes down.
    • The primary copy can be on a different site than where your search head is located. Cluster Master determines that based on search affinity and other factors.
      • Cluster Master tries to keep it on the cluster which gives better performance, but no guarantee to be on the same site all the time.

 

0 Karma

arunsunny
Path Finder

@VatsalJagani - So from your statement "A primary copy of the bucket: There will be always one primary copy of the bucket, regardless of whether it's a single-site or multi-site cluster" - you are saying always one primary copy in the cluster.

But if you look at the below lines in the doc, we can keep more than one primary copy on different sites.  https://docs.splunk.com/Documentation/Splunk/9.0.2/Indexer/Multisitesearchaffinity 

"There are also ways to configure the site search factor to ensure that all sites have searchable copies, even without explicitly specifying some or all of them. For example, a three-site cluster with site_search_factor = origin:1, total:3 guarantees one searchable copy per site, and thus enables search affinity for each site. Each site will have primary copies of all buckets."

Let me know your thoughts on this.

Happy Splunking.

 

 

 

Tags (1)
0 Karma

VatsalJagani
SplunkTrust
SplunkTrust

@arunsunny - Here is my understanding about a primary copy of a bucket from Consultant Training.

  • The search head always sends search requests to all the indexers (peers) to all the sites regardless of single-site or multi-site, regardless of affinity enabled or not.
  • The primary bucket is how Splunk decides which bucket needs to be searched.
  • If there is more than one primary bucket means duplicate data in the search results.
  • There could be multiple searchable copies of a bucket but there will be only one primary bucket. That's how Splunk ensures no duplicate data in the results.

 

I hope this helps!!!

0 Karma

vishaltaneja070
Motivator

correct SF=2 means 2X time of raw + index data files.

jiaqya
Builder

Thanks, thats what i wanted to know.

0 Karma

vishaltaneja070
Motivator

@jiaqya

Yes correct it will increase the disk space as well. As RF only allow to store raw data, in case of SF indexed data copies will also be there. So it will require more space.

Please find the below doc for better understanding:
https://docs.splunk.com/Documentation/Splunk/7.2.6/Indexer/Bucketsandclusters

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...