Alerting

Machine learning outliers

rrovers
Path Finder

I want to use the machine learning toolkit to detect outliers. 

I've made a query with earliest=-2mon@mon latest=@mon to let splunk determine the values for outliers for that period. I want to run the search every day and let the alert send an email when a new outlier is detected since the last run.  

I can't find out how to do this. Every time the search runs it detects all outliers of the last 2 months.

Labels (1)
0 Karma

to4kawa
SplunkTrust
SplunkTrust
0 Karma

rrovers
Path Finder

Thank you for the information. Your answer is quite extensive and probably usefull to learn more about machine learning.

The thing I want to know is, is if it is possible to let machine learning determine  the lowerbound and upperbound for a long period (for example 2 months or may be even 1 year) and run the search every day as an alert that only gives me the new (since the last day) outliers.

0 Karma

to4kawa
SplunkTrust
SplunkTrust

https://docs.splunk.com/Documentation/MLApp/5.2.0/User/DNOlegacyassist


I don't think we need to get too hung up on machine learning.

0 Karma

rrovers
Path Finder

The functionality seems the same as I used for my alert. The result is not what I'm looking for.

Let's try to clearify it with an example.

In the machine learning app I created an experiment with this simple search to use for this example

index=_internal sourcetype="splunkd_remote_searches" earliest=-1w@w latest=now
| eval day=strftime(_time,"%Y-%m-%d")
| stats count by day 

1 outlier is detected.

After saving this alert I have created an alert from the overview screen (manage - create alert).

My goal is to use the period since last week to determine lower- and upperbound but only receive an alert when there are new outliers since the last run. But now every run over last week the same outlier is detected.

I wonder if it is possible what I want.

 

0 Karma

to4kawa
SplunkTrust
SplunkTrust

use outputcsv and make query with the csv.

rrovers
Path Finder

Hi, thanks for your answer. I gave it a try but missing some information.

Can you please explain a bit more?

0 Karma

rrovers
Path Finder

I think I solved it by joining 2 searches.

The first 1 to determine the lowerbound and the upperbound over a long period (last 2 months)

the second 1 to check whether the count of the events of the last day is less than the lowerbound or more than the upperbound I determined in the first one.

0 Karma

2savage
Engager

Aloha @rrovers ,

I think what @to4kawa is saying you should do is to create a lookup file and output your results to that lookup. For example, you would output _time, machinename, and whatever field you believe is valuable using | outputcsv. In turn, you can query the lookup file each day to remove the previous days' outliers. 

I am also going to be experimenting with machine learning and looking to build profiles for users and computers, probably with login patterns at first. Using outputcsv would be a good way to keep track of results, though I'm sure there are other ways to do it. 

 

rrovers
Path Finder

Hi @2savage,

I solved it by collecting the daily results to a summary index (I prefer summary indexes above lookups for this kind of functionality). 

I don't think this is part of machine learning but it works fine for me.

0 Karma