Multi-Site Cluster | Failure Tolerance

sagaraverma · ‎03-30-2020

Let's say if I have 4 indexers at one site 'AB' and 4 indexers at another site 'CD'(DR site).
site_replication_factor=origin:2,total:3
site_search_factor=origin:1,total:2

Question :1 I understand from this document that in a situation where 3 of my indexers go down at 'AB' site , my 4th indexer will keep on ingesting the data and would keep copies in reserve state to be distributed when other indexers come back in place? Please confirm.

Question :2 What if all my 4 indexers go down at 'AB' site ..how would ingestion be managed then ? Would cluster master automate the data ingestion to DR site 'CD' indexers ?

Question :3 Since I have site_replication_factor of origin:2, total:3 and let's say two indexer machines at 'AB' site, both holding copy of same bucket goes down. Now, in this situation all copies(two) for a specific bucket become unavailable at site 'AB', then would cluster master instruct to receive a copy from DR site 'CD' and get that copied to 2 running indexers at 'AB' site ?

richgalloway · ‎03-30-2020

Answer 1: Confirmed. Sort of. There's no such thing as a "reserve state". Buckets simply won't be replicated until another AB indexer comes on-line.

Answer 2: It depends on how you've set up your outputs.conf files in your environment. If they contain all indexers or use Indexer Discovery then the sending systems will send their data to a surviving indexer. If there are servers configured to send only to site AB then they will buffer data until an AB indexer is available.

Answer 3: Yes, the CM will try to restore the replication and search factors by copying data from the CD site to surviving indexers in the AB site.

---
If this reply helps you, Karma would be appreciated.

sagaraverma · ‎03-30-2020

/Answer 2: It depends on how you've set up your outputs.conf files in your environment. If they contain all indexers or use Indexer Discovery then the sending systems will send their data to a surviving indexer. If there are servers configured to send only to site AB then they will buffer data until an AB indexer is available./

What all I need to look at ? to know more around this behavior ? is this something which cluster master is going to control ?

Please be noted that we have Active-Active configuration where both the sites receive data from different clients and are acting as DR for each other as well.

richgalloway · ‎03-30-2020

Look at the outputs.conf file(s) in your deployment server's deployment-apps directory. It may also be in your CM tool (Ansible, Puppet, etc.).

Active/active is normal multi-site cluster behavior.

---
If this reply helps you, Karma would be appreciated.

sagaraverma · ‎03-30-2020

Active-Active in the sense that both the sites will have licensing cost for the clients they will be ingesting for and not only the other site acting as DR.
To be true, all the documents I have gone through on splunk official website does not explain in particular where single CM handles two different active sites that are fulfilling HA & DR requirements along with ingesting their own data. Could you please point me to such online doc which explains it with all the needed settings for .conf files ?

sagaraverma · ‎03-30-2020

Also , ansible playbook which is converting the two sites into multi-site active-active configuration seems to have below parameters only for server.conf -
'constrain_singlesite_buckets' -- 'false',
'multisite', value -- 'true'
'available_sites' -- 'site1,site2'
'site_replication_factor' -- origin:2, total:3
'site_search_factor' -- origin:1, total:2
'replication_factor' -- value: '1'
'search_factor' -- value: '1'

Seems to be nothing specific under output.conf other than some parameters for forwarding CM data to indexers -
forwardedindex.filter.disable = true
indexAndForward = false

richgalloway · ‎03-30-2020

Your outputs.conf file must have a server setting or a indexerDiscovery setting.

---
If this reply helps you, Karma would be appreciated.

sagaraverma · ‎03-31-2020

But that's what you suggested to look at under outputs.conf -
/Your outputs.conf file must have a server setting or a indexerDiscovery setting./

and here is what splunk says around these parameters -

"server = [|]:, [|]:, ...
* A comma-separated list of one or more systems to send data to over a
TCP socket.
* Required if the 'indexerDiscovery' setting is not set.
* Typically used to specify receiving Splunk systems, although you can use
it to send data to non-Splunk systems (see the 'sendCookedData' setting).
* For each system you list, the following information is required:
* The IP address or server name where one or more systems are listening.
* The port on which the syslog server is listening.
indexerDiscovery =
* The name of the master node to use for indexer discovery.
* Instructs the forwarder to fetch the list of indexers from the master node
specified in the corresponding [indexer_discovery:] stanza.
* No default."

richgalloway · ‎04-01-2020

My point was outputs.conf does not restrict the SH to search on any specific indexers. Nor does it restrict ingestion of data to any specific site.

---
If this reply helps you, Karma would be appreciated.

sagaraverma · ‎03-30-2020

outputs.conf for CM , right !!!
Request if you can point me to some online doc that explains it and can explain how these parameters control such mechanism in active-active cluster ..would be really helpful.

richgalloway · ‎03-30-2020

outputs.conf not for CM, but for everything else (except indexers). The file is documented in the Admin manual and in $SPLUNK_HOME/etc/system/README/outputs.conf.spec.

---
If this reply helps you, Karma would be appreciated.

sagaraverma · ‎03-31-2020

I understand that this would restrict SH to search on some site-specific indexers.

But what about restricting ingestion of data to some specific site ?
We are using HEC.

richgalloway · ‎03-31-2020

outputs.conf has nothing to do with running searches. Nor does it have anything to do with ingesting data. It merely tells a Splunk instance where to put its data.

---
If this reply helps you, Karma would be appreciated.

Multi-Site Cluster | Failure Tolerance

Splunk Enterprise Security 8.0.2 Availability: On cloud and On-premise!

Logs to Metrics

Developer Spotlight with Paul Stout