Hi,
I have built a ML model for detecting Categorial outliers. Base search for the model is given as last 30 days[training set]. An alert has been scheduled for the same if no.of results>0 everyday.
For today the alerts generated will take a training data from Aug 19 to Sep 17 and give outliers as an alert if any.
For tomorrow I wanted to confirm will it run the model again taking training data from Aug 20 to Sep 18 or will it be sustaining with the same set of data from Aug 19 to Sep 17 to detect outliers.
Kindly share your ideas.
you are right.
you can detect outliers using Splunk core commands also. for example you can use outlier or anomalydetection ( anomalydetection uses oulier and amonlousvalue commands).
you can detect using simple standard deviation or average/mean/median/mode.
you are right.
you can detect outliers using Splunk core commands also. for example you can use outlier or anomalydetection ( anomalydetection uses oulier and amonlousvalue commands).
you can detect using simple standard deviation or average/mean/median/mode.
Detect Numerical field/Categorical field - doesn't use machine learning algorithms, it just uses Splunk core features.
https://docs.splunk.com/Documentation/MLApp/5.2.0/User/AboutMLTK
https://docs.splunk.com/Documentation/MLApp/5.2.0/User/Algorithms
Thank you @thambisetty
So if it is a core feature of splunk, in case of not specifying the time stamp in my query, it must have checked for anomolies every day for the last 24 hours. It was not having a training set itself.
I should have used any of the below three algorithms.
Am I right?
So I wanted to know will it continuously train the model every 24 hours due to alert scheduled or will it detect outliers from the old training data set.
No, The alert scheduled for every 24 hours will make use of model which is already trained. If you use apply command in your alert which is scheduled for every 24 hours.
sourcetype=list1 |stats count values(files) values(user) values(action) by device_name,command | anomalydetection action=annotate "device_name" "command" "count" "values(user)"
I don't see any algorithm used or fit or apply command in above search. anomalydetection is just Splunk core command not ML algorithm.
Have a look at below conf presentation:
https://conf.splunk.com/files/2019/recordings/FN1390.mp4
I would really like to watch that video - however, even when logged into Splunk, I receive an HTTP 401 unauthorized response.
When attempting to view another video from 2018, however (https://conf.splunk.com/files/2018/recordings/app-sorcery-building-splunk-fn1390.mp4), I can access the video.
I take it nobody else is seeing this...?
Hi @thambisetty
Thank you.The video is extremely helpful.
But in recent version MLTK-->Experiments-->Predict Numerical field/Categorical field, We have an option to fit a model.Also we have option to select algorithm as below.
In MLTK-->Experiments-->Detect Numerical field/Categorical field, We dont have an option called fit model and also we dont have option to select algorithm as below.
My SPL query is like below after giving detect outliers,
sourcetype=files command="*abc*" earliest=-90d@d latest=-1d@d|stats count values(file_path) values(user_name) values(action) by device_name,command
| anomalydetection "device_name" "command" "count" "values(user_name)" action=annotate
| eval isOutlier = if(probable_cause != "", "1", "0")
| table "device_name" "command" "count" "values(user_name)", probable_cause, isOutlier
| sort 100000 probable_cause
But it doesn't have any fit command. So do you suggest that I have to add an algorithm manually in SPL query?
I hope you have scheduled searches as below:
what I understand from your question is that when you continuously training your model, the new training results will be appended to existing training results or not? If yes, below is the answer:
Splunk ML Toolkit will overwrite existing training set with new training set.
Hello @thambisetty
I have applied fit command once to obtain training set[Last 30 days]. And I have scheduled an alert[every 24 hrs] if it detects any outlier>0.
So I wanted to know will it continuously train the model every 24 hours due to alert scheduled or will it detect outliers from the old training data set.
Note:My scheduled search query is as below after scheduling alert,
sourcetype=list1 |stats count values(files) values(user) values(action) by device_name,command | anomalydetection action=annotate "device_name" "command" "count" "values(user)"
There is partial_fit=true to get time incremental learning. and partial_fit supports few algorithms only