It would take a major catastrophe to bring down an entire AWS region. Spreading an app across AZs in a single region is sufficient in most cases. It all depends on your risk tolerance, of course. Indexer replication is key to data protection, but it doesn't have to replicate to another region. The copies can be in other zones. I understand the goal of minimizing outside dependencies. Splunk, however, is not fully HA and doesn't try to be. For instance, there can be only one CM and there is no built-in mechanism for a hot CM to keep a cold CM current. Fortunately, that's not a problem since a fresh CM can easily rebuild its state with information supplied by the indexers. The indexers just need to know where the new CM is and that can be done using DNS (or other networking tricks). Patient: Doctor, it hurts when I do this.
Doctor: Well, don't do that. If a Splunk component cannot communicate with another, necessary Splunk component then there's something wrong with the architecture. Firewall rules or other changes need to be made so components can talk to each other as intended. Requiring forwarders to send data only to local indexers is reasonable and commonplace. It works well if the local indexers can replicate data to remote indexers. Requiring forwarders to talk only to a local DS/LM/CM is also common, mainly because most customers have only one. If the DS/LM/CM fails then the forwarder continues to function using the most recent configuration it has until the server is restored. I like your idea #2. Avoid using intermediate forwarders as in your idea #3. That add complexity and can hamper performance. Stick with your multi-site cluster for ensuring your data exists in two places.
... View more