Deployment Architecture
Highlighted

Can I control where the primary copy resides in a multisite indexer cluster setup?

Builder

Question:

I have SiteA and SiteB and plan to keep 2 copies, ie RF=2

I would like to use this setup where forwarders send data to SiteA, then the replication occurs to SiteB.

Each site would maintain a copy each.

Can I control where the primary copy resides in case of multisite? I would prefer it to reside on SiteA.

Is this possible OR how can this be achieved?

0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Communicator

Hi

You can configure the masternode like this (server.conf):

[clustering]
multisite = true
site_replication_factor = origin:1 total:2
site_search_factor = origin:1, total:2
available_sites = site1, site2

On each Indexer you will have to configure its site (server.conf):

[general]
site = site1

You have to configure the Universalforwarder, so that it only sends the Logs to SiteA:

[tcpout]
defaultGroup = mygroup
forwardedindex.filter.disable = true
useACK = true

[tcpout:mygroup]
server = idx1_site1:9997, idx2_site1:9997

The primary copy will be on siteA. It will switch to SiteB if the indexer on SiteA is down. If you want to prevent this you would have to increase the RF to 3 --> site_replication_factor = origin:2 total:3

View solution in original post

Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Builder

Markus , thanks for your inputs. so in this case if RF =2 with Site A and SiteB, does this result in full data being available if SiteA is down ? as alternate copy is available on SiteB.

0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Communicator

Hi @jiaqya

Lets go through site_replication_factor = origin:1 total:2
This config origin:1 means that one copy of the bucket is on the site where the main bucket is created. total:2means that 2 copies of the main bucket has to exist. In your setting it means that the other bucket will be stored on site2.

site_replication_factor = <comma-separated string>
* Only valid for 'mode=master' and is only used if 'multisite=true'.
* This specifies the per-site replication policy for any given
  bucket represented as a comma-separated list of per-site entries.
* Currently specified globally and applies to buckets in all
  indexes.
* Each entry is of the form <site-id>:<positive integer> which
  represents the number of copies to make in the specified site
* Valid site-ids include two mandatory keywords and optionally
  specific site-ids from site1 to site63
* The mandatory keywords are:
  - origin: Every bucket has a origin site which is the site of
  the peer that originally created this bucket. The notion of
  'origin' makes it possible to specify a policy that spans across
  multiple sites without having to enumerate it per-site.
  - total: The total number of copies we want for each bucket.
* When a site is the origin, it could potentially match both the
  origin and a specific site term. In that case, the max of the
  two is used as the count for that site.
* The total must be greater than or equal to sum of all the other
  counts (including origin).
* The difference between total and the sum of all the other counts
  is distributed across the remaining sites.
* Example 1: site_replication_factor = origin:2, total:3
  Given a cluster of 3 sites, all indexing data, every site has 2
  copies of every bucket ingested in that site and one rawdata
  copy is put in one of the other 2 sites.
* Example 2: site_replication_factor = origin:2, site3:1, total:3
  Given a cluster of 3 sites, 2 of them indexing data, every
  bucket has 2 copies in the origin site and one copy in site3. So
  site3 has one rawdata copy of buckets ingested in both site1 and
  site2 and those two sites have 2 copies of their own buckets.
* Default: origin:2, total:3
0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Builder

So this means a duplicate copy is always at Site2 .
so this also means if i lose site1 , i still have all the data at Site2.

this i ask, such that, if i have 5 indexers on site1 and 5 on site2, so i can still have access to all the data even if i lose site 1 , ie all 5 indexers.. right ?

even when RF=2

0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Path Finder

Yes, site 2 data will also be available for search when executing the search from Site 2. If Site 1 search heads are available then also you can get the results from Site 2 provided you have disabled the search affinity.

0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Communicator

Yes, the data will be available on both sites. You can lose one site and you have still all data.

The searchfactor is configured with: site_search_factor = origin:1, total:2

So if site1 goes down your Searchheads are still able to search for all data.

0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Builder

Great , this is exactly what i wanted.. Thanks.

0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Builder

couple of follow up question on this.

Can i also have a multi site clustering for search head, so that Site1 search head can search only Site1 indexers and Site2 search head and search only Site2 indexers.. This may resolve a error i am seeing on my indexers related to "underlying storage issues " when too many searches are run..

another question, how quick is the cluster replication, so if a bucket comes into Site1 indexer, is that replicated quickly to Site2 indexer... Trying to understand the time required .

0 Karma
Highlighted

Re: Can I control where the primary copy resides in a multisite indexer cluster setup?

Communicator

Hi @jiaqya

What you are looking for is Search Affinity. Have a look at this documentation:
https://docs.splunk.com/Documentation/Splunk/7.2.4/Indexer/Multisitesearchaffinity

The Replication of a bucket is fast, but you need the network bandwidth for it. A default bucket size is 750 MB (maxDataSize = auto) and 10GB if you use maxDataSize = auto_high_volume

0 Karma