Not sure if it is a good way to solve this. Currently I do not have access to Splunk Machine Learning Toolkit due to computer management policies. I would like to check if I can use any self written logic in Splunk to detecting outliers with a given set of data.
Might be too broad of a question.
Thanks in advance.
@quahfamili, Can you install Splunk Machine Learning Toolkit (MLTK) on your personal machine? MLTK allows you to view underlying queries and macros for finding outliers, which you can then implement in your Splunk instance. However, if the query relies on an algorithm they can not be usable unless you install MLTK on the instance where you want to identify outlier.
currently not possible to install on that system.
I was just thinking aloud to see if it is possible to write some simple algorithm in splunk, something like manual trend line with some threshold to build a model and check each data point against the model.
I actually hope I can install the MLTK to test but I cannot.
What I meant was for
Interqartile Range and
Mean Absolute deviation, you can use built in Splunk queries which are also used in Machine Learning Toolkit. These do not require MLTK to run but, in order to get those queries you will have to install MLTK somewhere (may be your personal laptop with Splunk Enterprise), not on the machine where you are building the Outlier detection.
If you want to create your own custom search command you can your Python SDK for Splunk. However, it is easier and better to integrate Machine Learning Algorithms via MLTK rather that starting from scratch on your own:
@niketnilay I do not have access to the splunk system side (no admin) too. Thanks for the reply I will read up on what you share. Might have to see if it is possible to python some logic in.
You can detect outliers in a number of ways, e.g.
| timechart span=1h count | streamstats window=24 avg(count) as avg stdev(count) as std | eval m=2, lower=avg-(std * m), upper=avg+(std * m), outlier = if(count < lower OR count > upper, 1, 0) | table _time upper lower count outlier
and then if you plot the outlier field on a chart overlay with a max Y axis of say, 2, you will have a vertical line showing the outliers
Adjust m to get a suitable standard deviation multiplier