Deployment Architecture

Multi-site clustering without replicating across sites?

Contributor

Here is the scenario -

  • 2 Sites: Site 1 and Site 2
  • Site 1 has 4 peers and indexes non-sensitive data
  • Site 2 has 2 peers and indexes sensitive data
  • There are search heads at each site
  • The search head at Site 1 is able to search across data at both sites

I am looking to replicate 2 copies of the data at each site, in order to provide individual site redundancy and was thinking a good way to do this would be to leverage multi-site clustering, placing the master node in site 1.

Clearly multi-site clustering was predominantly designed to replicate copies of data across sites for redundancy. However, in this case I do not want this to occur, as I am looking to ensure the sensitive data in site 2, stays in site 2. I also don't want the data from site 1 going to site 2.

From looking in the documentation, I am wondering if this type of configuration scenario would work:

site_replication_factor = origin:2, site1:0, site2:0, total:2

If I am reading the documentation right, the above configuration would replicate 2 copies at the origin site, but not push the data across the sites, or no?

Thanks!

1 Solution

Splunk Employee
Splunk Employee

Multisite clustering is designed to specify how you want to replicate data across sites, and not so much about where NOT to store your data. So if data sensitivity is a key requirement you should not depend simple origin:2,total:2 type settings.

So your use case is you do not want Site 1 data to replicate to Site 2 or vice versa. In this case you don't need multisite clustering at all.

What you need is

-- Set up a single site cluster in Site 1, which includes indexers from site 1 only. Add a search head (SH1) which can search Site 1

-- Set up a single site cluster in Site 2, which includes indexers from site 2 only. Add a search head (SH2) which can search Site 2

-- If you want to search both sites, set up another search head (SH3) which can search both the sites.

This approach would provide the best guarantee that sensitive data doesn't leave a site even accidentally

View solution in original post

Path Finder

I would like to be able to do the same sort of thing. Replicate data within a cluster only for some indexes.

So some indexes get copies only within sites, and some between sites. I am hoping this may have changed since the post.

There is a indexes.conf parameter that allow you to specify the number of replicated copies, but not where.

this is repFactor . An example is below.

[npac_misc]
coldPath = $SPLUNK_DB/npac_misc/colddb
homePath = $SPLUNK_DB/npac_misc/db
thawedPath = $SPLUNK_DB/npac_misc/thaweddb
repFactor = auto

0 Karma

Path Finder

I have almost the same "no replication between sites" need, but in my case I´m intendind to use Multi Site cluster as a way to improve the disk space among nodes :

1 master, 4 sites, each site with 2 nodes and 7.2Tb of disk space on each node.

I dont want to replicate logs between sites but i do want to use the same INDEX name on all sites.

The total space for logs would be 4 (sites ) x 7,2Tb = 28,8Tb and I could have a designed replication path instead of only setting replication factor of 2 with 8 nodes on a single cluster.

Also, this way the search heads connected to this master would receive search results from all nodes just by using "index=indexname" query.

Is that possible ?

0 Karma

Engager

Did you manage to solve the task? I need the same.

0 Karma

Splunk Employee
Splunk Employee

Multisite clustering is designed to specify how you want to replicate data across sites, and not so much about where NOT to store your data. So if data sensitivity is a key requirement you should not depend simple origin:2,total:2 type settings.

So your use case is you do not want Site 1 data to replicate to Site 2 or vice versa. In this case you don't need multisite clustering at all.

What you need is

-- Set up a single site cluster in Site 1, which includes indexers from site 1 only. Add a search head (SH1) which can search Site 1

-- Set up a single site cluster in Site 2, which includes indexers from site 2 only. Add a search head (SH2) which can search Site 2

-- If you want to search both sites, set up another search head (SH3) which can search both the sites.

This approach would provide the best guarantee that sensitive data doesn't leave a site even accidentally

View solution in original post

Contributor

Makes sense - thanks

0 Karma

Splunk Employee
Splunk Employee

You can avoid extra search heads by configuring either SH1 or SH2 to search both sites.

Cluster master, yes - you need additional one to monitor the other cluster. Cluster master can be hosted on VM, it doesn't require lots of firepower.

0 Karma

Contributor

Thanks Mahamed, this makes sense. However, it also requires an extra search head, an extra cluster master node and additional overhead. Therefore, I was looking to see if there might be something little more efficient from an infrastructure/management point of view.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!