Deployment Architecture

Splunk Monitoring Console HA Set-up

srek3502
Explorer

Hi,

I have a requirement to implement the Splunk Monitoring Console (DMC) in a High Availability (HA) setup. At present, we are operating with a single DMC server, which creates a single point of failure and poses operational risk in the event of downtime or hardware issues. To mitigate this, I am considering the deployment of two new virtual machines (VMs) configured as identical DMC instances, placed behind a load balancer to ensure redundancy and continuous availability.

Could you please share your perspective on whether this solution is technically sound, and highlight any potential risks, limitations, or lessons learned from your past experience that i should take into account before proceeding?

Thanks in advance

Labels (1)
0 Karma

victor_menezes
Communicator

I endorse what the folks shared above. MCD alone does a lot of noise across the board due to its REST searches that comes packaged to make it works on its nature.

In my organization, what I did was to get a second MCD box in another datacenter for HA, but splunk service is always stopped there. 
In a very high level description, I have an rsync script that keeps both on sync on a given interval, along with a custom automation in place (using SOAR in this case) that checks whenever the main MCD stops responding REST, causing the playbook to check host availability, and in case of failures it then starts the service on the second box.
At the load balancer, the health check also monitors REST so if it is not responding it marks the box as offline, so after starting the service in the second the HC will mark it as up and bring it back to live.
Everything ends with an email communication so we know that the flip was done and the proper status of each step.
Kinda alternative to keep MCD structure alive, but this is just in case you actually need it otherwise it may not worth the stress.

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The main question here is what problem are you trying to solve?

In a typical scenario ability of relatively quickly restore a VM (by means of relatively frequent snapshot or simply a rebuild and restore which is quite fast if properly automated) is considered "good enough".

The MC on its own is not required for correct operation of the Splunk environment. Yes, it's a useful tool for an admin to check if everything is running OK (and diagnose problems should they arise) but it's not necessary for the environment to work.

If it's only "checkbox HA" because your organisation's internal policy states that "everything must be HA" (yes, I've seen similar cases), it might be easier for everyone involved (since making such "HA" setup can lead to more problems than it's worth) to declare an exception to the policy (with a proper explanation/justification).

srek3502
Explorer

Got it. That means, it will be a performance over head to the monitored search peers if we add more DMC servers as both needs to pull the metrics to populate the MC dashboards . correct ?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

That is correct.

Also what @PickleRick said.

---
If this reply helps you, Karma would be appreciated.
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Unlike the Cluster Manager Splunk makes no provision for HA in the Monitoring Console (MC).  That means changes on one node will not automatically happen in the other.  Setting up the MC as a search head cluster (SHC) may help, but that's an unsupported configuration and is not known to replicate MC artifacts.  Using an SHC adds complexity and means you'll need four VMs, three for the cluster and one for a deployer.

Having multiple MC hosts also adds to the search load imposed on the monitored instances.

A load balancer is not strictly necessary since there's no read load to balance.  You can achieve the same goal with DNS round-robin.

---
If this reply helps you, Karma would be appreciated.
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...