Splunk Local Disaster Recovery Solution

jiaminyun · ‎10-15-2024

Hello, we urgently need to obtain a Splunk local disaster recovery solution and hope to receive a best practice explanation. The existing Splunk consists of 3 search heads, 1 deployer, 1 master node, 1 DMC, 3 indexes, and 2 heavy forwarders. In this architecture, the search replication factors are all 2 and there is stock data available. The demand for local disaster recovery is: The host room where the existing data center's Xinchuang SIEM system is located has been shut down, and the data in the disaster recovery room can be queried normally. The closure of the newly built disaster recovery host room will not affect the use of the existing data center's SIEM system. RPO 0 cannot lose data, RTO can recover within 6 hours.

richgalloway · ‎10-16-2024

The Splunk Validated Architectures manual should help. You may be interested in the M3/M13 or M4/M14 models. See https://docs.splunk.com/Documentation/SVA/current/Architectures/M4M14

---
If this reply helps you, Karma would be appreciated.

jiaminyun · ‎10-29-2024

After testing UF output cloning, it was found that it is impossible to achieve true same data distribution across multiple clusters! Is there any good solution for dual writing? most urgent!

jiaminyun · ‎10-22-2024

Thank you for your response, We have achieved the final same city disaster recovery architecture by combining M3/M13 and UF clone dual writing!

PickleRick · ‎10-16-2024

Well, it's actually _not_ a disaster recovery. It's a HA solution with some assumed level of fault tolerance.

@jiaminyunThere is no such thing as "0 RPO" unless you do make some boundary conditions and prepare accordingly. A HA infrastructure like the one from SVAs can protect you in case of some disasters but will not protect you from other ones (like misconfiguration or deliberate data destroying). If you're OK with that - be my guest. Just be aware of it.

RTO actually depends on your equipment, storage and resources (including personnel) you can allocate to the recovery task.

jiaminyun · ‎10-22-2024

Thank you for your response, We have achieved the final same city disaster recovery architecture by combining M3/M13 and UF clone dual writing!

jiaminyun · ‎10-29-2024

After testing UF output cloning, it was found that it is impossible to achieve true same data distribution across multiple clusters! Is there any good solution for dual writing? most urgent!

PickleRick · ‎10-30-2024

If you mean sending to two output groups from a single forwarder - that works until one of them gets blocked. Then both stop. It's by design.

Splunk Local Disaster Recovery Solution

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders

Join the Conversation

Splunk Local Disaster Recovery Solution

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

From Data to Insight: Announcing the Winners of the Splunk Dashboard Contest

Splunk Developers: Construct Your Future at the .conf26 Builder Bar

Quick connection discovery mode for forwarders