Splunk Search

MLTK alerts and training set

Janani_Krish
Path Finder

Hi,

I have built a ML model for detecting Categorial outliers. Base search for the model is given as last 30 days[training set]. An alert has been scheduled for the same if no.of results>0 everyday.
For today the alerts generated will take a training data from Aug 19 to Sep 17 and give outliers as an alert if any.
For tomorrow I wanted to confirm will it run the model again taking training data from Aug 20 to Sep 18 or will it be sustaining with the same set of data from Aug 19 to Sep 17 to detect outliers.

Kindly share your ideas.

Labels (1)
Tags (3)
0 Karma
1 Solution

thambisetty
SplunkTrust
SplunkTrust

you are right. 

you can detect outliers using Splunk core commands also. for example you can use outlier or anomalydetection ( anomalydetection uses oulier and amonlousvalue commands).

you can detect using simple standard deviation or average/mean/median/mode. 

————————————
If this helps, give a like below.

View solution in original post

thambisetty
SplunkTrust
SplunkTrust

you are right. 

you can detect outliers using Splunk core commands also. for example you can use outlier or anomalydetection ( anomalydetection uses oulier and amonlousvalue commands).

you can detect using simple standard deviation or average/mean/median/mode. 

————————————
If this helps, give a like below.

thambisetty
SplunkTrust
SplunkTrust

Detect Numerical field/Categorical field - doesn't use machine learning algorithms, it just uses Splunk core features.

https://docs.splunk.com/Documentation/MLApp/5.2.0/User/AboutMLTK

https://docs.splunk.com/Documentation/MLApp/5.2.0/User/Algorithms

————————————
If this helps, give a like below.
0 Karma

Janani_Krish
Path Finder

Thank you @thambisetty 

So if it is a core feature of splunk, in case of not specifying the time stamp in my query, it must have checked for anomolies every day for the last 24 hours. It was not having a training set itself. 

I should have used any of the below three algorithms.

Am I right?

0 Karma

thambisetty
SplunkTrust
SplunkTrust

So I wanted to know will it continuously train the model every 24 hours due to alert scheduled or will it detect outliers from the old training data set.

No, The alert scheduled for every 24 hours will make use of model which is already trained. If you use apply command in your alert which is scheduled for every 24 hours.

sourcetype=list1 |stats count values(files) values(user) values(action) by device_name,command | anomalydetection action=annotate "device_name" "command" "count" "values(user)"

I don't see any algorithm used or fit or apply command in above search. anomalydetection is just Splunk core command not ML algorithm.

Have a look at below conf presentation:

https://conf.splunk.com/files/2019/recordings/FN1390.mp4

————————————
If this helps, give a like below.
0 Karma

jasongb
Path Finder

I would really like to watch that video - however, even when logged into Splunk, I receive an HTTP 401 unauthorized response.

When attempting to view another video from 2018, however (https://conf.splunk.com/files/2018/recordings/app-sorcery-building-splunk-fn1390.mp4), I can access the video.

I take it nobody else is seeing this...?

0 Karma

Janani_Krish
Path Finder

Hi @thambisetty 

Thank you.The video is extremely helpful.

But in recent version MLTK-->Experiments-->Predict Numerical field/Categorical field, We have an option to fit a model.Also we have option to select algorithm as below.Capture.PNG

 

In MLTK-->Experiments-->Detect Numerical field/Categorical field, We dont have an option called fit model and also we dont have option to select algorithm as below.

image.png

My SPL query is like below after giving detect outliers,

sourcetype=files command="*abc*" earliest=-90d@d latest=-1d@d|stats count values(file_path) values(user_name) values(action) by device_name,command
| anomalydetection "device_name" "command" "count" "values(user_name)" action=annotate
| eval isOutlier = if(probable_cause != "", "1", "0")
| table "device_name" "command" "count" "values(user_name)", probable_cause, isOutlier
| sort 100000 probable_cause

But it doesn't have any fit command. So do you suggest that I have to add an algorithm manually in SPL query?

0 Karma

thambisetty
SplunkTrust
SplunkTrust

I hope you have scheduled searches as below:

  1.  To train model using fit command 
  2.  To apply trained model to new data using apply command.

what I understand from your question is that when you continuously training your model, the new training results will be appended to existing training results or not? If yes, below is the answer:

Splunk ML Toolkit will overwrite existing training set with new training set.

 

————————————
If this helps, give a like below.
0 Karma

Janani_Krish
Path Finder

Hello @thambisetty 

I have applied fit command once to obtain training set[Last 30 days]. And I have scheduled an alert[every 24 hrs] if it detects any outlier>0.

So I wanted to know will it continuously train the model every 24 hours due to alert scheduled or will it detect outliers from the old training data set.

Note:My scheduled search query is as below after scheduling alert,

sourcetype=list1 |stats count values(files) values(user) values(action) by device_name,command | anomalydetection action=annotate "device_name" "command" "count" "values(user)"

0 Karma

Swathi1
Loves-to-Learn Lots

There is partial_fit=true to get time incremental learning. and partial_fit supports few algorithms only 

0 Karma
Get Updates on the Splunk Community!

New Dates, New City: Save the Date for .conf25!

Wake up, babe! New .conf25 dates AND location just dropped!! That's right, this year, .conf25 is taking place ...

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud

Introduction to Splunk Observability Cloud - Building a Resilient Hybrid Cloud  In today’s fast-paced digital ...

Observability protocols to know about

Observability protocols define the specifications or formats for collecting, encoding, transporting, and ...