All Apps and Add-ons
Highlighted

Machine Learning Toolkit: What is a good practical example to use this with alerting in a system with too many records for an All Time search?

SplunkTrust
SplunkTrust

I have downloaded the Machine Learning Toolkit and Showcase app and tested out the response time example. Looks cool!

I like that you are shown the exact Splunk code you could use to identify outliers.

The example shown is an "All time" search and then head 1000 to get the latest 1000 entries.

| inputlookup hostperf.csv | eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S.%3Q%z") | timechart span=10m max(rtmax) as responsetime | head 1000

And then you can use responsetime to find the outliers.

In practice, how would we use this in a system where we have too many records for an All Time search? I am looking for good practical examples how one might use this library for alerting. Somehow you want to use your recent data to compare with a search over a longer period of time.

0 Karma
Highlighted

Re: Machine Learning Toolkit: What is a good practical example to use this with alerting in a system with too many records for an All Time search?

Splunk Employee
Splunk Employee

I think you are asking for the following?

The Numeric Outlier Detector, as of 2.1, does not store a model file for the statistical observations used to score anomalies. You can store the values from a previous search (a priori) by using | outputlookup MyNewModelFile.csv and then | inputlookup in a real time search to enrich the new events with the observed values.

Using your specific example, I would
1) click the open in search button , copying the search to a new splunk window for editing
2) edit to look like

| inputlookup hostperf.csv | eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S.%3Q%z") | timechart span=10m max(rtmax) as responsetime | head 1000
| eventstats median("responsetime") as median 
| eval absDev=(abs('responsetime'-median))
| eventstats median(absDev) as medianAbsDev
| outputlookup MySavedModel.csv 

2) the | outputlookup MySavedModel.csv , perhaps remove the |head 1000 section as needed. The point of MySavedModel.csv is you are storing the anomaly levels as detected by every scheduled run over a long time window and then scoring alerts based on a smaller time window "real time" search.
3) save the search as a report that runs in the background and updates itself every N minutes as needed
4) use inputlookup (with a time join) for MySavedModel.csv to enrich the new data as it comes in and make a eval/alert. the new search will use the

| eval isOutlier=if('responsetime' < lowerBound OR 'responsetime' > upperBound, 1, 0)

using the old lowerBound,upperBound from the observed time and you can save the alert as the isOutlier bit.

View solution in original post