We already have Splunk installed and are using it pretty heavily. The use cases consist mainly of various log files or databases, and we have provided multiple teams with dashboards for the apps/systems.
Now my management would like us to evaluate Splunk as an enterprise monitoring tool. In other words, as a replacement for something like SCOM, ITM, Up.Time, etc. For us, that would mean monitoring/alerting for about 10k servers (Windows, Linux, Unix) for memory, disk, processors, services, processes, sql, exchange, web sites etc.
Is anyone trying to use Splunk in that capacity and on that scale? I know the data is on the servers and there are apps out there to collect most of it. But I'm a little leery of bringing all of that data back to the infrastructure and searching on it there. As opposed to most other monitoring tools, that have an agent that receives some sort of policy and only sends alerts/status back to the infrastructure.
And then there's the management of it all. Different servers with different alerting needs, thresholds, polling intervals and recurrence. Not to mention the requirement to be able to take action against a server depending on an issue, e.g. restarting service or cleaning up disk space.
As is usually the case, I'm sure it can be done. But I feel like we would be shoehorning Splunk into that type of solution, and it may be more trouble than it's worth.
If anyone out there has any relevant experience and could share some advice/guidance, that would be great.
Thanks!
... View more