Deployment Architecture

Multi-site clustering without replicating across sites?

pj
Contributor

Here is the scenario -

  • 2 Sites: Site 1 and Site 2
  • Site 1 has 4 peers and indexes non-sensitive data
  • Site 2 has 2 peers and indexes sensitive data
  • There are search heads at each site
  • The search head at Site 1 is able to search across data at both sites

I am looking to replicate 2 copies of the data at each site, in order to provide individual site redundancy and was thinking a good way to do this would be to leverage multi-site clustering, placing the master node in site 1.

Clearly multi-site clustering was predominantly designed to replicate copies of data across sites for redundancy. However, in this case I do not want this to occur, as I am looking to ensure the sensitive data in site 2, stays in site 2. I also don't want the data from site 1 going to site 2.

From looking in the documentation, I am wondering if this type of configuration scenario would work:

site_replication_factor = origin:2, site1:0, site2:0, total:2

If I am reading the documentation right, the above configuration would replicate 2 copies at the origin site, but not push the data across the sites, or no?

Thanks!

1 Solution

mahamed_splunk
Splunk Employee
Splunk Employee

Multisite clustering is designed to specify how you want to replicate data across sites, and not so much about where NOT to store your data. So if data sensitivity is a key requirement you should not depend simple origin:2,total:2 type settings.

So your use case is you do not want Site 1 data to replicate to Site 2 or vice versa. In this case you don't need multisite clustering at all.

What you need is

-- Set up a single site cluster in Site 1, which includes indexers from site 1 only. Add a search head (SH1) which can search Site 1

-- Set up a single site cluster in Site 2, which includes indexers from site 2 only. Add a search head (SH2) which can search Site 2

-- If you want to search both sites, set up another search head (SH3) which can search both the sites.

This approach would provide the best guarantee that sensitive data doesn't leave a site even accidentally

View solution in original post

jimdiconectiv
Path Finder

I would like to be able to do the same sort of thing. Replicate data within a cluster only for some indexes.

So some indexes get copies only within sites, and some between sites. I am hoping this may have changed since the post.

There is a indexes.conf parameter that allow you to specify the number of replicated copies, but not where.

this is repFactor . An example is below.

[npac_misc]
coldPath = $SPLUNK_DB/npac_misc/colddb
homePath = $SPLUNK_DB/npac_misc/db
thawedPath = $SPLUNK_DB/npac_misc/thaweddb
repFactor = auto

0 Karma

theunf
Communicator

I have almost the same "no replication between sites" need, but in my case I´m intendind to use Multi Site cluster as a way to improve the disk space among nodes :

1 master, 4 sites, each site with 2 nodes and 7.2Tb of disk space on each node.

I dont want to replicate logs between sites but i do want to use the same INDEX name on all sites.

The total space for logs would be 4 (sites ) x 7,2Tb = 28,8Tb and I could have a designed replication path instead of only setting replication factor of 2 with 8 nodes on a single cluster.

Also, this way the search heads connected to this master would receive search results from all nodes just by using "index=indexname" query.

Is that possible ?

0 Karma

bondbig
Engager

Did you manage to solve the task? I need the same.

0 Karma

mahamed_splunk
Splunk Employee
Splunk Employee

Multisite clustering is designed to specify how you want to replicate data across sites, and not so much about where NOT to store your data. So if data sensitivity is a key requirement you should not depend simple origin:2,total:2 type settings.

So your use case is you do not want Site 1 data to replicate to Site 2 or vice versa. In this case you don't need multisite clustering at all.

What you need is

-- Set up a single site cluster in Site 1, which includes indexers from site 1 only. Add a search head (SH1) which can search Site 1

-- Set up a single site cluster in Site 2, which includes indexers from site 2 only. Add a search head (SH2) which can search Site 2

-- If you want to search both sites, set up another search head (SH3) which can search both the sites.

This approach would provide the best guarantee that sensitive data doesn't leave a site even accidentally

pj
Contributor

Makes sense - thanks

0 Karma

mahamed_splunk
Splunk Employee
Splunk Employee

You can avoid extra search heads by configuring either SH1 or SH2 to search both sites.

Cluster master, yes - you need additional one to monitor the other cluster. Cluster master can be hosted on VM, it doesn't require lots of firepower.

0 Karma

pj
Contributor

Thanks Mahamed, this makes sense. However, it also requires an extra search head, an extra cluster master node and additional overhead. Therefore, I was looking to see if there might be something little more efficient from an infrastructure/management point of view.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...