Solved: Can I control where the primary copy resides in a ...

jiaqya · ‎02-20-2019

Question:

I have SiteA and SiteB and plan to keep 2 copies, ie RF=2

I would like to use this setup where forwarders send data to SiteA, then the replication occurs to SiteB.

Each site would maintain a copy each.

Can I control where the primary copy resides in case of multisite? I would prefer it to reside on SiteA.

Is this possible OR how can this be achieved?

markusspitzli · ‎02-20-2019

Hi

You can configure the masternode like this (server.conf):

[clustering]
multisite = true
site_replication_factor = origin:1 total:2
site_search_factor = origin:1, total:2
available_sites = site1, site2

On each Indexer you will have to configure its site (server.conf):

[general]
site = site1

You have to configure the Universalforwarder, so that it only sends the Logs to SiteA:

[tcpout]
defaultGroup = mygroup
forwardedindex.filter.disable = true
useACK = true

[tcpout:mygroup]
server = idx1_site1:9997, idx2_site1:9997

The primary copy will be on siteA. It will switch to SiteB if the indexer on SiteA is down. If you want to prevent this you would have to increase the RF to 3 --> site_replication_factor = origin:2 total:3

View solution in original post

markusspitzli · ‎02-20-2019

Hi

You can configure the masternode like this (server.conf):

[clustering]
multisite = true
site_replication_factor = origin:1 total:2
site_search_factor = origin:1, total:2
available_sites = site1, site2

On each Indexer you will have to configure its site (server.conf):

[general]
site = site1

You have to configure the Universalforwarder, so that it only sends the Logs to SiteA:

[tcpout]
defaultGroup = mygroup
forwardedindex.filter.disable = true
useACK = true

[tcpout:mygroup]
server = idx1_site1:9997, idx2_site1:9997

The primary copy will be on siteA. It will switch to SiteB if the indexer on SiteA is down. If you want to prevent this you would have to increase the RF to 3 --> site_replication_factor = origin:2 total:3

jiaqya · ‎02-22-2019

Markus , thanks for your inputs. so in this case if RF =2 with Site A and SiteB, does this result in full data being available if SiteA is down ? as alternate copy is available on SiteB.

markusspitzli · ‎02-23-2019

Hi @jiaqya

Lets go through site_replication_factor = origin:1 total:2
This config origin:1 means that one copy of the bucket is on the site where the main bucket is created. total:2means that 2 copies of the main bucket has to exist. In your setting it means that the other bucket will be stored on site2.

site_replication_factor = <comma-separated string>
* Only valid for 'mode=master' and is only used if 'multisite=true'.
* This specifies the per-site replication policy for any given
  bucket represented as a comma-separated list of per-site entries.
* Currently specified globally and applies to buckets in all
  indexes.
* Each entry is of the form <site-id>:<positive integer> which
  represents the number of copies to make in the specified site
* Valid site-ids include two mandatory keywords and optionally
  specific site-ids from site1 to site63
* The mandatory keywords are:
  - origin: Every bucket has a origin site which is the site of
  the peer that originally created this bucket. The notion of
  'origin' makes it possible to specify a policy that spans across
  multiple sites without having to enumerate it per-site.
  - total: The total number of copies we want for each bucket.
* When a site is the origin, it could potentially match both the
  origin and a specific site term. In that case, the max of the
  two is used as the count for that site.
* The total must be greater than or equal to sum of all the other
  counts (including origin).
* The difference between total and the sum of all the other counts
  is distributed across the remaining sites.
* Example 1: site_replication_factor = origin:2, total:3
  Given a cluster of 3 sites, all indexing data, every site has 2
  copies of every bucket ingested in that site and one rawdata
  copy is put in one of the other 2 sites.
* Example 2: site_replication_factor = origin:2, site3:1, total:3
  Given a cluster of 3 sites, 2 of them indexing data, every
  bucket has 2 copies in the origin site and one copy in site3. So
  site3 has one rawdata copy of buckets ingested in both site1 and
  site2 and those two sites have 2 copies of their own buckets.
* Default: origin:2, total:3

jiaqya · ‎02-24-2019

So this means a duplicate copy is always at Site2 .
so this also means if i lose site1 , i still have all the data at Site2.

this i ask, such that, if i have 5 indexers on site1 and 5 on site2, so i can still have access to all the data even if i lose site 1 , ie all 5 indexers.. right ?

even when RF=2

markusspitzli · ‎02-24-2019

Yes, the data will be available on both sites. You can lose one site and you have still all data.

The searchfactor is configured with: site_search_factor = origin:1, total:2

So if site1 goes down your Searchheads are still able to search for all data.

jiaqya · ‎02-26-2019

couple of follow up question on this.

Can i also have a multi site clustering for search head, so that Site1 search head can search only Site1 indexers and Site2 search head and search only Site2 indexers.. This may resolve a error i am seeing on my indexers related to "underlying storage issues " when too many searches are run..

another question, how quick is the cluster replication, so if a bucket comes into Site1 indexer, is that replicated quickly to Site2 indexer... Trying to understand the time required .

markusspitzli · ‎02-26-2019

Hi @jiaqya

What you are looking for is Search Affinity. Have a look at this documentation:
https://docs.splunk.com/Documentation/Splunk/7.2.4/Indexer/Multisitesearchaffinity

The Replication of a bucket is fast, but you need the network bandwidth for it. A default bucket size is 750 MB (maxDataSize = auto) and 10GB if you use maxDataSize = auto_high_volume

jiaqya · ‎03-01-2019

Ok thanks, the size of bucket then depends on the bandwidth we have between sites which can be obtained by testing.

thanks again Mark

jiaqya · ‎02-25-2019

Great , this is exactly what i wanted.. Thanks.

jvishwak · ‎02-24-2019

Yes, site 2 data will also be available for search when executing the search from Site 2. If Site 1 search heads are available then also you can get the results from Site 2 provided you have disabled the search affinity.

Can I control where the primary copy resides in a multisite indexer cluster setup?

Fall Into Learning with New Splunk Education Courses

Super Optimize your Splunk Stats Searches: Unlocking the Power of tstats, TERM, and ...

How Splunk Observability Cloud Prevented a Major Payment Crisis in Minutes

Are you a member of the Splunk Community?

Can I control where the primary copy resides in a multisite indexer cluster setup?

Fall Into Learning with New Splunk Education Courses

Super Optimize your Splunk Stats Searches: Unlocking the Power of tstats, TERM, and ...

How Splunk Observability Cloud Prevented a Major Payment Crisis in Minutes