I've looked through the examples provided with the Machine Learning Toolkit app and was wondering if anyone has used the MLT to detect outliers in Splunk log traffic, or a similar data set where there are multiple simultaneous streams of data of interest. The "Detect Numeric Outliers" example does exactly what I want, but I can only use it on one "stream" of data at a time. I've used it successfully on a single index of interest in my data, but I'd like to monitor multiple indexes simultaneously to keep a better eye on my data.
The graph generated in the provided example, with the upper/lower bound interval and outliers clearly displayed along with the data, is helpful but not necessary. Ideally I would receive an email with "indexes of interest", along with limited historical data/traffic info for context in potentially taking action on an issue.
I currently have a search that does this manually, but it is quite crude and I'd like to take advantage of Splunk's internal ML capabilities for scalability.
There's no built-in way to keep track of multiple data streams at the same time with the "Detect Numeric Outliers" assistant as it ships with the app, but it's possible using the building blocks provided by the MLTK. For example, if you would like to receive alerts when a data stream has outliers, you could individually run each stream of data through the "Detect Numeric Outliers" assistant and set up an an alert for it, which would then notify you about outliers on that stream in the fashion of your choosing. You can likewise use the "Outliers Chart" custom visualization to set up a custom dashboard for monitoring multiple streams of data.