We are building out our new Splunk environment and I wanted to see what everyone else might be using to monitor their health across the environment. We are going to be distributed, so monitoring one server isn't going to work.
What is making me question this is that I found a .conf slide deck that mentioned all three of these apps. What has been your experience with using these? Is there more overlap than unique features that one isn't as useful, or are they all fairly useful at giving us a good picture of our health?
As the Product Manager for DMC, I have a bit of a bias but here is my take:
DMC is the only Splunk Monitoring solution that has the full weight of Splunk Engineering, Sustaining, Support, and Documentation behind it
DMC ships with the product out-of-the box
DMC strives to be comprehensive of all Splunk sub-systems (Search Head Clustering, Forwarders, etc)
DMC has pro-active alerting built in through the Platform Alert mechanism
That said, I love the fact that the platform exposes enough information through logs, metrics, and introspection that the community can build useful views and features for customers that the Splunk product team may not have dreamed up yet. In fact, the DMC product and engineering teams have collaborated with the maintainers of those two Apps for entire sections of DMC and ideas on how to make DMC better. The Indexes and Volumes section of DMC is almost all built with Fire Brigade as inspiration and we've drawn many ideas from Health Check about how to better surface search activity.
As far as I know, it is non-intrusive to have all of them installed so fortunately it is not one or the other if you like certain features of each.
Thanks for the insight. We are just starting off and wanted to get a good feel for what road we will need to go down. We have a strong requirement to show that our Splunk Infrastructure is working so I wanted to get a jump on the right tools.
Each of these is going to come with it's own unique set of benefits.
I am personally a huge fan of Fire Brigade when it comes to managing your data retention policies and troubleshooting a number of index related issues. This has been a really helpful app a number of times for me.
The Distributed Management Console is very useful in giving you a look at indexing rates and various performance related metrics, license usage, topology, and forwarder connections.
Splunk Health Check Overview will give you some good information about search activity and scheduler activity. There may be some overlap between this and the DMC, but it does have it's own unique set of searches and dashboards.
If you were running all 3 of these you'd be able to paint a pretty good picture of your environment.