Splunk Enterprise

How to create an alert that Detects anomalies or outliers in Splunk?

shashank_24
Path Finder

Hi, I have few alerts created which looks into failure rates of my services and I have put in a condition which says if the failure rate is > 10% AND number of failed request > 200 then trigger the alert.

This is really not the ideal way to do the monitoring. Is there a way in Splunk we can use the AI to detect anomalies or outliers over time? So basically if Splunk can detect a failure pattern and if that pattern is consistent then don't trigger an alert but if it goes beyond the threshold, only then trigger it?

Can we do this kind of stuff in Splunk using in-built  ML or AI?

Labels (2)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

Take a look at the ML toolkit - there are some good examples on outliers there - you can also roll your own, e.g. this type of search will look for hourly outliers outside 3 * stdev

search error
| bin _time span=1m
| stats count by _time
| streamstats window=60 avg(count) as avg stdev(count) as stdev
| eval multiplier = 3
| eval lower_bound = avg - (stdev * multiplier)
| eval upper_bound = avg + (stdev * multiplier)
| eval outlier = if(count < lower_bound OR count > upper_bound, 1, 0)
| table _time count lower_bound upper_bound outlier
Results Example

 

Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Take Action Automatically on Splunk Alerts with Red Hat Ansible Automation Platform

 Are you ready to revolutionize your IT operations? As digital transformation accelerates, the demand for ...

Calling All Security Pros: Ready to Race Through Boston?

Hey Splunkers, .conf25 is heading to Boston and we’re kicking things off with something bold, competitive, and ...

Beyond Detection: How Splunk and Cisco Integrated Security Platforms Transform ...

Financial services organizations face an impossible equation: maintain 99.9% uptime for mission-critical ...