Knowledge Management

Approaches to Identifying Patterns in Outliers

bschaap
Path Finder

I would like to know what approaches to take for detecting patterns in outliers using Splunk. I'm familiar with approaches to detect outliers but would like Splunk to help identify what things are in common to help speed up investigation of outliers. For instance, are there values in any of the fields that are common between the outliers? Or do those values typically exceed a certain threshold?

Thanks!

Brian

0 Karma

niketn
Legend

@bschaap, You should try out Splunk Machine Learning Toolkit App with Python For Scientific Computing Add On to work.

Machine Learning Toolkit provides several examples to Detect and Analyze Numerical and Categorical Outliers through several Machine Learning algorithms and Standard Outlier Detection mecahnisms. Refer to following documentation and showcase example on Yoututbe.
With your sample/test data you can experiment with thresholds/algorithms and several other critical parameters to ensure that outliers are getting detected as expected. You can capture outlier SPL queries and apply to your own use cases.

http://docs.splunk.com/Documentation/MLApp/latest/User/Showcaseexamples
https://www.youtube.com/watch?v=8POjmd9LYdY&index=5&list=PLxkFdMSHYh3Q1jwpgJJ0ZSnRzZIx2c_KM

Machine Learning Toolkit also provides several visualizations specifically for outlier detection and interpretation: http://docs.splunk.com/Documentation/MLApp/latest/User/Thebasicprocessofmachinelearning

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

skoelpin
SplunkTrust
SplunkTrust

The Splunk ML app uses the predict command for all time series forecasting. The added benefit of using this app is for the outlier visualization. A better approach would be to take time slices of events over several weeks, and create a range of normal. Once you have this, you can then apply regressors from the ML app to your model

0 Karma

skoelpin
SplunkTrust
SplunkTrust

To start, you can use the predict command and establish an upper and lower bounds to establish what is "normal" and alert on anything outside of the bounds. The limitation to this, is you can't train your data so you have to run a large search each time the predict command runs.

A better approach would be to use relative_time and use 15 minute spans, then clone and shift your data into their time slots which will allow you to run fast searches over massive data sets without taxing your hardware

0 Karma
Get Updates on the Splunk Community!

CX Day is Coming!

Customer Experience (CX) Day is on October 7th!! We're so excited to bring back another day full of wonderful ...

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

The Big One: Splunk 10 is Here!  The moment many of you have been waiting for has arrived! We are thrilled to ...

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Today, we’re excited to announce the release of a brand new AI assistant usage dashboard in Cloud Monitoring ...