I have SiteA and SiteB and plan to keep 2 copies, ie RF=2
I would like to use this setup where forwarders send data to SiteA, then the replication occurs to SiteB.
Each site would maintain a copy each.
Can I control where the primary copy resides in case of multisite? I would prefer it to reside on SiteA.
Is this possible OR how can this be achieved?
You can configure the masternode like this (server.conf):
[clustering] multisite = true site_replication_factor = origin:1 total:2 site_search_factor = origin:1, total:2 available_sites = site1, site2
On each Indexer you will have to configure its site (server.conf):
[general] site = site1
You have to configure the Universalforwarder, so that it only sends the Logs to SiteA:
[tcpout] defaultGroup = mygroup forwardedindex.filter.disable = true useACK = true [tcpout:mygroup] server = idx1_site1:9997, idx2_site1:9997
The primary copy will be on siteA. It will switch to SiteB if the indexer on SiteA is down. If you want to prevent this you would have to increase the RF to 3 -->
site_replication_factor = origin:2 total:3
Markus , thanks for your inputs. so in this case if RF =2 with Site A and SiteB, does this result in full data being available if SiteA is down ? as alternate copy is available on SiteB.
Lets go through
site_replication_factor = origin:1 total:2
origin:1 means that one copy of the bucket is on the site where the main bucket is created.
total:2means that 2 copies of the main bucket has to exist. In your setting it means that the other bucket will be stored on site2.
site_replication_factor = <comma-separated string> * Only valid for 'mode=master' and is only used if 'multisite=true'. * This specifies the per-site replication policy for any given bucket represented as a comma-separated list of per-site entries. * Currently specified globally and applies to buckets in all indexes. * Each entry is of the form <site-id>:<positive integer> which represents the number of copies to make in the specified site * Valid site-ids include two mandatory keywords and optionally specific site-ids from site1 to site63 * The mandatory keywords are: - origin: Every bucket has a origin site which is the site of the peer that originally created this bucket. The notion of 'origin' makes it possible to specify a policy that spans across multiple sites without having to enumerate it per-site. - total: The total number of copies we want for each bucket. * When a site is the origin, it could potentially match both the origin and a specific site term. In that case, the max of the two is used as the count for that site. * The total must be greater than or equal to sum of all the other counts (including origin). * The difference between total and the sum of all the other counts is distributed across the remaining sites. * Example 1: site_replication_factor = origin:2, total:3 Given a cluster of 3 sites, all indexing data, every site has 2 copies of every bucket ingested in that site and one rawdata copy is put in one of the other 2 sites. * Example 2: site_replication_factor = origin:2, site3:1, total:3 Given a cluster of 3 sites, 2 of them indexing data, every bucket has 2 copies in the origin site and one copy in site3. So site3 has one rawdata copy of buckets ingested in both site1 and site2 and those two sites have 2 copies of their own buckets. * Default: origin:2, total:3
So this means a duplicate copy is always at Site2 .
so this also means if i lose site1 , i still have all the data at Site2.
this i ask, such that, if i have 5 indexers on site1 and 5 on site2, so i can still have access to all the data even if i lose site 1 , ie all 5 indexers.. right ?
even when RF=2
Yes, site 2 data will also be available for search when executing the search from Site 2. If Site 1 search heads are available then also you can get the results from Site 2 provided you have disabled the search affinity.
Yes, the data will be available on both sites. You can lose one site and you have still all data.
The searchfactor is configured with:
site_search_factor = origin:1, total:2
So if site1 goes down your Searchheads are still able to search for all data.
couple of follow up question on this.
Can i also have a multi site clustering for search head, so that Site1 search head can search only Site1 indexers and Site2 search head and search only Site2 indexers.. This may resolve a error i am seeing on my indexers related to "underlying storage issues " when too many searches are run..
another question, how quick is the cluster replication, so if a bucket comes into Site1 indexer, is that replicated quickly to Site2 indexer... Trying to understand the time required .
What you are looking for is Search Affinity. Have a look at this documentation:
The Replication of a bucket is fast, but you need the network bandwidth for it. A default bucket size is 750 MB (
maxDataSize = auto) and 10GB if you use
maxDataSize = auto_high_volume