Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

All Apps and Add-ons

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community
- :
- Splunk Answers
- :
- Apps and Add-ons
- :
- All Apps and Add-ons
- :
- Splunk Machine Learning Toolkit: detecting both up...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

lradics

Path Finder

07-17-2017
10:17 AM

Hello!

I have numeric data which has values primarily in the range of 500-1,000, with acceptable values being in the range of 500-10,000. I also have a number of outliers below 500 (ranging from 3 to 499), and some outliers above 10,000 (the most noticeable being as high as 1,000,000). I would like to use the Machine Learning Toolkit to detect all the outliers (both those too high, and those too low), ideally to set up some sort of alert.

My base search is pretty straightforward:

```
index=xxx source="xxx" reactionTime!=-1 reactionTime=* user=* | dedup ID
```

I tried using the built-in Detect Numeric Outliers assistant, but the higher outliers threw it off (even if I excluded the absolute highest), so it couldn't reliably mark values below 500 as outliers.

More recently I've been working with the OneClassSVM algorithm; however, it seems that no matter what I do (I've tried playing around with all the parameters I can), it only marks the bottom nu percent of my data as outliers - completely ignoring the too-high values.

Is there any way to detect both upper and lower outliers for my data, either with one of the abovementioned algorithms, or through some other method altogether?

Thank you!

1 Solution

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Splunk Machine Learning Toolkit: detecting both upper and lower outliers

skoelpin

SplunkTrust

07-17-2017
10:51 AM

Hello @iradics

Are you familiar with the `predict`

command?

https://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Predict

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Splunk Machine Learning Toolkit: detecting both upper and lower outliers

lradics

Path Finder

07-17-2017
11:18 AM

Hi @skoelpin,

I am familiar with the command - how would you suggest I use it? I don't want to forecast values for missing data; I'm looking instead to detect outliers in existing data. Is there a way to do that with `predict`

?

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Splunk Machine Learning Toolkit: detecting both upper and lower outliers

jcvytla

New Member

03-27-2018
04:00 PM

Hello @iradics

I'm also working on similar problem.. I need your help in seeing through the solution..

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

cmerriman

Super Champion

07-17-2017
11:29 AM

there are a few mathematical ways you can predict outliers with, however you cant save them as a model, at least not to my knowledge.

there is the anomalydetection command, which can save you a lot of time then by typing out the SPL that would create them.

zscore is for standard deviation, histogram is for median absolute deviation and IQR is, well, IQR 🙂

https://docs.splunk.com/Documentation/SplunkCloud/6.6.0/SearchReference/Anomalydetection

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Splunk Machine Learning Toolkit: detecting both upper and lower outliers

lradics

Path Finder

07-17-2017
01:23 PM

That's a pity about not being able to save as a model; I was hoping to be able to train whatever method I ended up using.

I've been playing with the various forms of the anomalydetection command, and none of them are doing quite what I want, at least so far - they're all marking the highest values as outliers, but none of them do anything with the lower outliers. Do you know of a specific parameter that does that? And I'll keep exploring it, to see what I can do...

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Splunk Machine Learning Toolkit: detecting both upper and lower outliers

cmerriman

Super Champion

07-17-2017
01:33 PM

it's likely because it thinks the lower bound is negative.

this is a similar SPL to the IQR method and you might be able to tweak the lower bound eval to see get it where you want it. This breaks the data into hourly counts and then uses the overall median to break it into sections to find the outliers. It is set to 2 IQRs above and below the median for the upper and lower bounds. you can play with those (as well as the other parts, obviously, to fit your needs) to see where you need it.

```
|timechart span=1h count
|eventstats median(count) as median p25(count) as p25 p75(count) as p75
|eval IQR=p75-p25
|eval lower_bound=median-(IQR*2)
|eval upper_bound=median+(IQR*2)
|eval isOutlier=if(count>upper_bound OR count<lower_bound,10,0)
```

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Splunk Machine Learning Toolkit: detecting both upper and lower outliers

lradics

Path Finder

07-18-2017
07:17 AM

That looks promising - thank you! I'll work with that and see what I can get.