Knowledge Management

Approaches to Identifying Patterns in Outliers

bschaap
Path Finder

I would like to know what approaches to take for detecting patterns in outliers using Splunk. I'm familiar with approaches to detect outliers but would like Splunk to help identify what things are in common to help speed up investigation of outliers. For instance, are there values in any of the fields that are common between the outliers? Or do those values typically exceed a certain threshold?

Thanks!

Brian

0 Karma

niketn
Legend

@bschaap, You should try out Splunk Machine Learning Toolkit App with Python For Scientific Computing Add On to work.

Machine Learning Toolkit provides several examples to Detect and Analyze Numerical and Categorical Outliers through several Machine Learning algorithms and Standard Outlier Detection mecahnisms. Refer to following documentation and showcase example on Yoututbe.
With your sample/test data you can experiment with thresholds/algorithms and several other critical parameters to ensure that outliers are getting detected as expected. You can capture outlier SPL queries and apply to your own use cases.

http://docs.splunk.com/Documentation/MLApp/latest/User/Showcaseexamples
https://www.youtube.com/watch?v=8POjmd9LYdY&index=5&list=PLxkFdMSHYh3Q1jwpgJJ0ZSnRzZIx2c_KM

Machine Learning Toolkit also provides several visualizations specifically for outlier detection and interpretation: http://docs.splunk.com/Documentation/MLApp/latest/User/Thebasicprocessofmachinelearning

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

skoelpin
SplunkTrust
SplunkTrust

The Splunk ML app uses the predict command for all time series forecasting. The added benefit of using this app is for the outlier visualization. A better approach would be to take time slices of events over several weeks, and create a range of normal. Once you have this, you can then apply regressors from the ML app to your model

0 Karma

skoelpin
SplunkTrust
SplunkTrust

To start, you can use the predict command and establish an upper and lower bounds to establish what is "normal" and alert on anything outside of the bounds. The limitation to this, is you can't train your data so you have to run a large search each time the predict command runs.

A better approach would be to use relative_time and use 15 minute spans, then clone and shift your data into their time slots which will allow you to run fast searches over massive data sets without taxing your hardware

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...