Deployment Architecture

Is it possible to set site_search_factor = origin:1, site4:0, total:3 in a multisite indexer cluster to restrict searchable data being indexed to a specific site?

Engager

Hi Team,

I have a theoretical question about multisite indexer clustering.

As site_replication_factor is how many copies of the raw data (unsearchable) are replicated within the cluster, and site_search_factor is how many copies of searchable data (which also contains the raw data). Then could I set up an environment with a configuration such as:

[clustering]
mode = master
multisite=true
available_sites=site1,site2,site3,site4
site_replication_factor = origin:1,site4:1,total:2
site_search_factor = origin:1,site4:0,total:3

OR

[clustering]
mode = master
multisite=true
available_sites=site1,site2,site3,site4
site_replication_factor = origin:1,site4:1,total:2
site_search_factor = origin:1,site1:1,site2:1,site3:1,total:3

The objective would be to have a designated site which would only be a store for the raw (unsearchable) data, therefore wouldn't be searched or used for anything else. While having the three other sites set up in a more standard configuration, where each has a copy of its own raw data, and a distributed copy of the searchable data.

I can't find anywhere in the documentation which says if you can specify site4:0 to restrict searchable data being replicated to a specific site.

If the above works, this would minimize the copies of raw data (unsearchable) within the cluster (saving space), but ensure the is always a site with a full backup of ALL raw data from around the cluster which could be used to rebuild ALL indexed data in the event of extensive disaster.

Thanks!

0 Karma

Legend

First, you are a bit mixed up in your definition of the search factor. The search factor can never be larger than the replication factor. The search factor defines how many of the replicated buckets will be searchable. The search factor is not "added to" the replication factor. I think of it this way

  • replication factor = how many copies of the rawdata
  • search factor = how many copies of the tsidx files (and the tsidx files can't exist without the corresponding rawdata)

Now, I think you can do what you want, but the syntax needs to be something like this:

available_sites=site1,site2,site3,site4
site_replication_factor = origin:1,site4:1,total:3
site_search_factor = origin:1,site4:0,total:3

This would force site4 to have only non-searchable buckets - but it would have a copy of all the rawdata in case of a disaster.

BTW, I assume that you have no forwarders sending data to site4. If you do, then site4 will be the origin site for some data, and therefore there will be searchable buckets at site4.

I do have to ask though: why have so many sites if you aren't going to have at least one copy of the buckets (searchable or not) at each site? Since sites are purely defined by you, and not actually tied to geography, I would only have 2 sites in your example.

available_sites=site1,site2
site_replication_factor = origin:1,site2:1,total:3
site_search_factor = origin:1,site2:0,total:3

Finally, like you, I did not find anything in the documentation that said whether the search factor could be explicitly set to zero for a site.

0 Karma

Path Finder

Hi @lguinn2 ,

I liked the above answer.

I have one requirement where have 3 sites and planning to keep to search factor.

Current Config:

[clustering]
site_search_factor = origin:1,site1:1,site2:1,site3:1,total:3

 

To making into 2 Search Factor in any sites will the below settings works?

[clustering]
site_search_factor = origin:1,site1:1,site2:1,site3:1,total:2

 

 

 

 

 

 

0 Karma

Engager

Thanks for the clarification. I did understand that, just didn't do a good job of explaining it!
I'll try and give this a try at some point.

With regard to why so many sites, this was just an example.

0 Karma

SplunkTrust
SplunkTrust

Hi,

Did you get anywhere with this?
I'm also interested in what you are suggesting as we might have legal issues if some data leaves certain countries.

The way we initially plan to approach this was to designate a pair of Heavy Forwarders per country that will perform forwarding, local indexing and filtering, and if we ever needed to search for country-sensitive data we could always go to the local HF and use the GUI there as it won't be searchable from anywhere else.

Thanks,
J

0 Karma

Engager

Hi J,

Unfortunately not, I didn't get the time to test. Might give it a go in a few weeks when I'm not onsite.

But I agree, that type of use case is exactly what I was thinking!

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!