Deployment Architecture

Multi-site clustering without replicating across sites?

pj
Contributor

Here is the scenario -

  • 2 Sites: Site 1 and Site 2
  • Site 1 has 4 peers and indexes non-sensitive data
  • Site 2 has 2 peers and indexes sensitive data
  • There are search heads at each site
  • The search head at Site 1 is able to search across data at both sites

I am looking to replicate 2 copies of the data at each site, in order to provide individual site redundancy and was thinking a good way to do this would be to leverage multi-site clustering, placing the master node in site 1.

Clearly multi-site clustering was predominantly designed to replicate copies of data across sites for redundancy. However, in this case I do not want this to occur, as I am looking to ensure the sensitive data in site 2, stays in site 2. I also don't want the data from site 1 going to site 2.

From looking in the documentation, I am wondering if this type of configuration scenario would work:

site_replication_factor = origin:2, site1:0, site2:0, total:2

If I am reading the documentation right, the above configuration would replicate 2 copies at the origin site, but not push the data across the sites, or no?

Thanks!

1 Solution

mahamed_splunk
Splunk Employee
Splunk Employee

Multisite clustering is designed to specify how you want to replicate data across sites, and not so much about where NOT to store your data. So if data sensitivity is a key requirement you should not depend simple origin:2,total:2 type settings.

So your use case is you do not want Site 1 data to replicate to Site 2 or vice versa. In this case you don't need multisite clustering at all.

What you need is

-- Set up a single site cluster in Site 1, which includes indexers from site 1 only. Add a search head (SH1) which can search Site 1

-- Set up a single site cluster in Site 2, which includes indexers from site 2 only. Add a search head (SH2) which can search Site 2

-- If you want to search both sites, set up another search head (SH3) which can search both the sites.

This approach would provide the best guarantee that sensitive data doesn't leave a site even accidentally

View solution in original post

jimdiconectiv
Path Finder

I would like to be able to do the same sort of thing. Replicate data within a cluster only for some indexes.

So some indexes get copies only within sites, and some between sites. I am hoping this may have changed since the post.

There is a indexes.conf parameter that allow you to specify the number of replicated copies, but not where.

this is repFactor . An example is below.

[npac_misc]
coldPath = $SPLUNK_DB/npac_misc/colddb
homePath = $SPLUNK_DB/npac_misc/db
thawedPath = $SPLUNK_DB/npac_misc/thaweddb
repFactor = auto

0 Karma

theunf
Communicator

I have almost the same "no replication between sites" need, but in my case I´m intendind to use Multi Site cluster as a way to improve the disk space among nodes :

1 master, 4 sites, each site with 2 nodes and 7.2Tb of disk space on each node.

I dont want to replicate logs between sites but i do want to use the same INDEX name on all sites.

The total space for logs would be 4 (sites ) x 7,2Tb = 28,8Tb and I could have a designed replication path instead of only setting replication factor of 2 with 8 nodes on a single cluster.

Also, this way the search heads connected to this master would receive search results from all nodes just by using "index=indexname" query.

Is that possible ?

0 Karma

bondbig
Engager

Did you manage to solve the task? I need the same.

0 Karma

mahamed_splunk
Splunk Employee
Splunk Employee

Multisite clustering is designed to specify how you want to replicate data across sites, and not so much about where NOT to store your data. So if data sensitivity is a key requirement you should not depend simple origin:2,total:2 type settings.

So your use case is you do not want Site 1 data to replicate to Site 2 or vice versa. In this case you don't need multisite clustering at all.

What you need is

-- Set up a single site cluster in Site 1, which includes indexers from site 1 only. Add a search head (SH1) which can search Site 1

-- Set up a single site cluster in Site 2, which includes indexers from site 2 only. Add a search head (SH2) which can search Site 2

-- If you want to search both sites, set up another search head (SH3) which can search both the sites.

This approach would provide the best guarantee that sensitive data doesn't leave a site even accidentally

pj
Contributor

Makes sense - thanks

0 Karma

mahamed_splunk
Splunk Employee
Splunk Employee

You can avoid extra search heads by configuring either SH1 or SH2 to search both sites.

Cluster master, yes - you need additional one to monitor the other cluster. Cluster master can be hosted on VM, it doesn't require lots of firepower.

0 Karma

pj
Contributor

Thanks Mahamed, this makes sense. However, it also requires an extra search head, an extra cluster master node and additional overhead. Therefore, I was looking to see if there might be something little more efficient from an infrastructure/management point of view.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...