I want to use the machine learning toolkit to detect outliers.
I've made a query with earliest=-2mon@mon latest=@mon to let splunk determine the values for outliers for that period. I want to run the search every day and let the alert send an email when a new outlier is detected since the last run.
I can't find out how to do this. Every time the search runs it detects all outliers of the last 2 months.
Thank you for the information. Your answer is quite extensive and probably usefull to learn more about machine learning.
The thing I want to know is, is if it is possible to let machine learning determine the lowerbound and upperbound for a long period (for example 2 months or may be even 1 year) and run the search every day as an alert that only gives me the new (since the last day) outliers.
https://docs.splunk.com/Documentation/MLApp/5.2.0/User/DNOlegacyassist
I don't think we need to get too hung up on machine learning.
The functionality seems the same as I used for my alert. The result is not what I'm looking for.
Let's try to clearify it with an example.
In the machine learning app I created an experiment with this simple search to use for this example
index=_internal sourcetype="splunkd_remote_searches" earliest=-1w@w latest=now
| eval day=strftime(_time,"%Y-%m-%d")
| stats count by day
1 outlier is detected.
After saving this alert I have created an alert from the overview screen (manage - create alert).
My goal is to use the period since last week to determine lower- and upperbound but only receive an alert when there are new outliers since the last run. But now every run over last week the same outlier is detected.
I wonder if it is possible what I want.
use outputcsv and make query with the csv.
Hi, thanks for your answer. I gave it a try but missing some information.
Can you please explain a bit more?
I think I solved it by joining 2 searches.
The first 1 to determine the lowerbound and the upperbound over a long period (last 2 months)
the second 1 to check whether the count of the events of the last day is less than the lowerbound or more than the upperbound I determined in the first one.
Aloha @rrovers ,
I think what @to4kawa is saying you should do is to create a lookup file and output your results to that lookup. For example, you would output _time, machinename, and whatever field you believe is valuable using | outputcsv. In turn, you can query the lookup file each day to remove the previous days' outliers.
I am also going to be experimenting with machine learning and looking to build profiles for users and computers, probably with login patterns at first. Using outputcsv would be a good way to keep track of results, though I'm sure there are other ways to do it.
Hi @2savage,
I solved it by collecting the daily results to a summary index (I prefer summary indexes above lookups for this kind of functionality).
I don't think this is part of machine learning but it works fine for me.