Hi,
I have a requirement to implement the Splunk Monitoring Console (DMC) in a High Availability (HA) setup. At present, we are operating with a single DMC server, which creates a single point of failure and poses operational risk in the event of downtime or hardware issues. To mitigate this, I am considering the deployment of two new virtual machines (VMs) configured as identical DMC instances, placed behind a load balancer to ensure redundancy and continuous availability.
Could you please share your perspective on whether this solution is technically sound, and highlight any potential risks, limitations, or lessons learned from your past experience that i should take into account before proceeding?
Thanks in advance
I endorse what the folks shared above. MCD alone does a lot of noise across the board due to its REST searches that comes packaged to make it works on its nature.
In my organization, what I did was to get a second MCD box in another datacenter for HA, but splunk service is always stopped there.
In a very high level description, I have an rsync script that keeps both on sync on a given interval, along with a custom automation in place (using SOAR in this case) that checks whenever the main MCD stops responding REST, causing the playbook to check host availability, and in case of failures it then starts the service on the second box.
At the load balancer, the health check also monitors REST so if it is not responding it marks the box as offline, so after starting the service in the second the HC will mark it as up and bring it back to live.
Everything ends with an email communication so we know that the flip was done and the proper status of each step.
Kinda alternative to keep MCD structure alive, but this is just in case you actually need it otherwise it may not worth the stress.
The main question here is what problem are you trying to solve?
In a typical scenario ability of relatively quickly restore a VM (by means of relatively frequent snapshot or simply a rebuild and restore which is quite fast if properly automated) is considered "good enough".
The MC on its own is not required for correct operation of the Splunk environment. Yes, it's a useful tool for an admin to check if everything is running OK (and diagnose problems should they arise) but it's not necessary for the environment to work.
If it's only "checkbox HA" because your organisation's internal policy states that "everything must be HA" (yes, I've seen similar cases), it might be easier for everyone involved (since making such "HA" setup can lead to more problems than it's worth) to declare an exception to the policy (with a proper explanation/justification).
Got it. That means, it will be a performance over head to the monitored search peers if we add more DMC servers as both needs to pull the metrics to populate the MC dashboards . correct ?
Unlike the Cluster Manager Splunk makes no provision for HA in the Monitoring Console (MC). That means changes on one node will not automatically happen in the other. Setting up the MC as a search head cluster (SHC) may help, but that's an unsupported configuration and is not known to replicate MC artifacts. Using an SHC adds complexity and means you'll need four VMs, three for the cluster and one for a deployer.
Having multiple MC hosts also adds to the search load imposed on the monitored instances.
A load balancer is not strictly necessary since there's no read load to balance. You can achieve the same goal with DNS round-robin.