Knowledge Management

Approaches to Identifying Patterns in Outliers

bschaap
Path Finder

I would like to know what approaches to take for detecting patterns in outliers using Splunk. I'm familiar with approaches to detect outliers but would like Splunk to help identify what things are in common to help speed up investigation of outliers. For instance, are there values in any of the fields that are common between the outliers? Or do those values typically exceed a certain threshold?

Thanks!

Brian

0 Karma

niketn
Legend

@bschaap, You should try out Splunk Machine Learning Toolkit App with Python For Scientific Computing Add On to work.

Machine Learning Toolkit provides several examples to Detect and Analyze Numerical and Categorical Outliers through several Machine Learning algorithms and Standard Outlier Detection mecahnisms. Refer to following documentation and showcase example on Yoututbe.
With your sample/test data you can experiment with thresholds/algorithms and several other critical parameters to ensure that outliers are getting detected as expected. You can capture outlier SPL queries and apply to your own use cases.

http://docs.splunk.com/Documentation/MLApp/latest/User/Showcaseexamples
https://www.youtube.com/watch?v=8POjmd9LYdY&index=5&list=PLxkFdMSHYh3Q1jwpgJJ0ZSnRzZIx2c_KM

Machine Learning Toolkit also provides several visualizations specifically for outlier detection and interpretation: http://docs.splunk.com/Documentation/MLApp/latest/User/Thebasicprocessofmachinelearning

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

skoelpin
SplunkTrust
SplunkTrust

The Splunk ML app uses the predict command for all time series forecasting. The added benefit of using this app is for the outlier visualization. A better approach would be to take time slices of events over several weeks, and create a range of normal. Once you have this, you can then apply regressors from the ML app to your model

0 Karma

skoelpin
SplunkTrust
SplunkTrust

To start, you can use the predict command and establish an upper and lower bounds to establish what is "normal" and alert on anything outside of the bounds. The limitation to this, is you can't train your data so you have to run a large search each time the predict command runs.

A better approach would be to use relative_time and use 15 minute spans, then clone and shift your data into their time slots which will allow you to run fast searches over massive data sets without taxing your hardware

0 Karma
Get Updates on the Splunk Community!

Preparing your Splunk Environment for OpenSSL3

The Splunk platform will transition to OpenSSL version 3 in a future release. Actions are required to prepare ...

Deprecation of Splunk Observability Kubernetes “Classic Navigator” UI starting ...

Access to Splunk Observability Kubernetes “Classic Navigator” UI will no longer be available starting January ...

Now Available: Cisco Talos Threat Intelligence Integrations for Splunk Security Cloud ...

At .conf24, we shared that we were in the process of integrating Cisco Talos threat intelligence into Splunk ...