Getting Data In

What is the best Site_replication_factor and site_search_factor value to assign multisite indexer clustering?

maniu1609
Path Finder

Hi Team,

We are provided with 5 servers to be configured as indexers. So we're planning to keep 3 indexers in site1 and 2 indexers in site2.

So what is the best Site_replication_factor and site_search_factor value I can mention in cluster master?
Please give me little bit explanation if possible. That could be good learning to me.

Thanks in advance!!

0 Karma
1 Solution

amitm05
Builder

@maniu1609

I believe you should once go through -
https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/Sitereplicationfactor
https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/Sitesearchfactor

These will really help you clear your concept and in taking your decision.

At high level, replication and search factors is a trade off between performance/availability and disk space.
The more you increase on your search and replication factors, the more space you require. So that is certainly a factor to consider if you have a high data ingestion. This has to be calculated against data availability that you want.

For my suggestion to your specific env, taking both disk space and data availability into consideration. I'd say -
site_replication_factor = origin:2,total:3
site_search_factor = origin:1,total:2

This would mean that for :
replication factor - your data origin site will always have 2 copies and 1 copy with the other site. There will be a total of 3 copies always across the 2 sites.
search factor - your data origin site will always have 1 searchable copy and 1 copy with the other site. There will be a total of 2 copies always across the 2 sites.

Please accept as answer and upvote if this helps. Thanks.

View solution in original post

amitm05
Builder

@maniu1609

I believe you should once go through -
https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/Sitereplicationfactor
https://docs.splunk.com/Documentation/Splunk/7.3.0/Indexer/Sitesearchfactor

These will really help you clear your concept and in taking your decision.

At high level, replication and search factors is a trade off between performance/availability and disk space.
The more you increase on your search and replication factors, the more space you require. So that is certainly a factor to consider if you have a high data ingestion. This has to be calculated against data availability that you want.

For my suggestion to your specific env, taking both disk space and data availability into consideration. I'd say -
site_replication_factor = origin:2,total:3
site_search_factor = origin:1,total:2

This would mean that for :
replication factor - your data origin site will always have 2 copies and 1 copy with the other site. There will be a total of 3 copies always across the 2 sites.
search factor - your data origin site will always have 1 searchable copy and 1 copy with the other site. There will be a total of 2 copies always across the 2 sites.

Please accept as answer and upvote if this helps. Thanks.

maniu1609
Path Finder

Thanks @amitm05 for your help. I'm happy that you have given me a direction. I'm clear with Site_replication_factor and site_search_factor value now.

So having odd number of indexers in site1 and even number of indexers in site2 isn't an issue. But we should be careful in choosing Site_replication_factor and site_search_factor. Am i correct?

0 Karma

amitm05
Builder

yes thats correct understanding.
Glad that it helped. Can you accept this as an answer please.

0 Karma
Get Updates on the Splunk Community!

Automatic Discovery Part 1: What is Automatic Discovery in Splunk Observability Cloud ...

If you’ve ever deployed a new database cluster, spun up a caching layer, or added a load balancer, you know it ...

Real-Time Fraud Detection: How Splunk Dashboards Protect Financial Institutions

Financial fraud isn't slowing down. If anything, it's getting more sophisticated. Account takeovers, credit ...

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...