- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have built a ML model for detecting Categorial outliers. Base search for the model is given as last 30 days[training set]. An alert has been scheduled for the same if no.of results>0 everyday.
For today the alerts generated will take a training data from Aug 19 to Sep 17 and give outliers as an alert if any.
For tomorrow I wanted to confirm will it run the model again taking training data from Aug 20 to Sep 18 or will it be sustaining with the same set of data from Aug 19 to Sep 17 to detect outliers.
Kindly share your ideas.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


you are right.
you can detect outliers using Splunk core commands also. for example you can use outlier or anomalydetection ( anomalydetection uses oulier and amonlousvalue commands).
you can detect using simple standard deviation or average/mean/median/mode.
If this helps, give a like below.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


you are right.
you can detect outliers using Splunk core commands also. for example you can use outlier or anomalydetection ( anomalydetection uses oulier and amonlousvalue commands).
you can detect using simple standard deviation or average/mean/median/mode.
If this helps, give a like below.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


Detect Numerical field/Categorical field - doesn't use machine learning algorithms, it just uses Splunk core features.
https://docs.splunk.com/Documentation/MLApp/5.2.0/User/AboutMLTK
https://docs.splunk.com/Documentation/MLApp/5.2.0/User/Algorithms
If this helps, give a like below.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you @thambisetty
So if it is a core feature of splunk, in case of not specifying the time stamp in my query, it must have checked for anomolies every day for the last 24 hours. It was not having a training set itself.
I should have used any of the below three algorithms.
Am I right?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


So I wanted to know will it continuously train the model every 24 hours due to alert scheduled or will it detect outliers from the old training data set.
No, The alert scheduled for every 24 hours will make use of model which is already trained. If you use apply command in your alert which is scheduled for every 24 hours.
sourcetype=list1 |stats count values(files) values(user) values(action) by device_name,command | anomalydetection action=annotate "device_name" "command" "count" "values(user)"
I don't see any algorithm used or fit or apply command in above search. anomalydetection is just Splunk core command not ML algorithm.
Have a look at below conf presentation:
https://conf.splunk.com/files/2019/recordings/FN1390.mp4
If this helps, give a like below.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would really like to watch that video - however, even when logged into Splunk, I receive an HTTP 401 unauthorized response.
When attempting to view another video from 2018, however (https://conf.splunk.com/files/2018/recordings/app-sorcery-building-splunk-fn1390.mp4), I can access the video.
I take it nobody else is seeing this...?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @thambisetty
Thank you.The video is extremely helpful.
But in recent version MLTK-->Experiments-->Predict Numerical field/Categorical field, We have an option to fit a model.Also we have option to select algorithm as below.
In MLTK-->Experiments-->Detect Numerical field/Categorical field, We dont have an option called fit model and also we dont have option to select algorithm as below.
My SPL query is like below after giving detect outliers,
sourcetype=files command="*abc*" earliest=-90d@d latest=-1d@d|stats count values(file_path) values(user_name) values(action) by device_name,command
| anomalydetection "device_name" "command" "count" "values(user_name)" action=annotate
| eval isOutlier = if(probable_cause != "", "1", "0")
| table "device_name" "command" "count" "values(user_name)", probable_cause, isOutlier
| sort 100000 probable_cause
But it doesn't have any fit command. So do you suggest that I have to add an algorithm manually in SPL query?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


I hope you have scheduled searches as below:
- To train model using fit command
- To apply trained model to new data using apply command.
what I understand from your question is that when you continuously training your model, the new training results will be appended to existing training results or not? If yes, below is the answer:
Splunk ML Toolkit will overwrite existing training set with new training set.
If this helps, give a like below.
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @thambisetty
I have applied fit command once to obtain training set[Last 30 days]. And I have scheduled an alert[every 24 hrs] if it detects any outlier>0.
So I wanted to know will it continuously train the model every 24 hours due to alert scheduled or will it detect outliers from the old training data set.
Note:My scheduled search query is as below after scheduling alert,
sourcetype=list1 |stats count values(files) values(user) values(action) by device_name,command | anomalydetection action=annotate "device_name" "command" "count" "values(user)"
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is partial_fit=true to get time incremental learning. and partial_fit supports few algorithms only
