Existential question here... 🙂
What is the appropriate mechanism in Splunk to have multiple (potentially hundreds) of alerts that are based on the latest events, rather than real-time or timeframe searches, while keeping our Splunk deployment sane and simple? (Is it even possible?)
Example:
I need an alert when a volume (disk) breaches an 80% used space threshold, and need it within 30 seconds of when Splunk gets an event. (Then similar alerts for NAS and SAN volumes, CPU, memory, interface utilization, and a whole bunch of other metrics.)
Setting up a few dozen of these realtime searches and respective alerts brings our cluster to its knees. Attempting to set up auto-refreshing and fast dashboards with the same metrics simply knocks it out. Whereas doing something like this in Solarwinds or Datadog - piece of cake, including statistics-based metrics (e.g. if a metric exceeds a 30-minute baseline by more than 20% over the last 3 minutes).
Is Splunk not the right product for the task? If it is, what is the technical term for my problem, how can it be solved in Splunk, and would you be so kind as to point me to where it's discussed?
Thanks!
P.S. Please read the question fully and please abstain from attempting to answer what you might think the question is. The question is specific enough:
What is the appropriate mechanism in Splunk to have multiple (potentially hundreds) of alerts that are based on the latest events, rather than real-time or timeframe searches, while keeping our Splunk deployment sane and simple?
To rephrase it a bit:
in an environment where the ability to set up fast alerts on a large number of metrics is important, is Splunk (OOTB) the right product for the task? If so, what are the mechanisms to accomplish that? (Because OOTB, Splunk is not suitable for it - at least not in my experience.)
P.P.S. Is metrics such a mechanism? If so what would be a good resource to get those set up and running fast? (Digging through the documentation tells me it's not a streamlined, easy experience.)
... View more