Approaches to Identifying Patterns in Outliers

bschaap · ‎10-23-2017

I would like to know what approaches to take for detecting patterns in outliers using Splunk. I'm familiar with approaches to detect outliers but would like Splunk to help identify what things are in common to help speed up investigation of outliers. For instance, are there values in any of the fields that are common between the outliers? Or do those values typically exceed a certain threshold?

Thanks!

Brian

niketn · ‎10-23-2017

@bschaap, You should try out Splunk Machine Learning Toolkit App with Python For Scientific Computing Add On to work.

Machine Learning Toolkit provides several examples to Detect and Analyze Numerical and Categorical Outliers through several Machine Learning algorithms and Standard Outlier Detection mecahnisms. Refer to following documentation and showcase example on Yoututbe.
With your sample/test data you can experiment with thresholds/algorithms and several other critical parameters to ensure that outliers are getting detected as expected. You can capture outlier SPL queries and apply to your own use cases.

http://docs.splunk.com/Documentation/MLApp/latest/User/Showcaseexamples
https://www.youtube.com/watch?v=8POjmd9LYdY&index=5&list=PLxkFdMSHYh3Q1jwpgJJ0ZSnRzZIx2c_KM

Machine Learning Toolkit also provides several visualizations specifically for outlier detection and interpretation: http://docs.splunk.com/Documentation/MLApp/latest/User/Thebasicprocessofmachinelearning

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

skoelpin · ‎10-23-2017

The Splunk ML app uses the predict command for all time series forecasting. The added benefit of using this app is for the outlier visualization. A better approach would be to take time slices of events over several weeks, and create a range of normal. Once you have this, you can then apply regressors from the ML app to your model

skoelpin · ‎10-23-2017

To start, you can use the predict command and establish an upper and lower bounds to establish what is "normal" and alert on anything outside of the bounds. The limitation to this, is you can't train your data so you have to run a large search each time the predict command runs.

A better approach would be to use relative_time and use 15 minute spans, then clone and shift your data into their time slots which will allow you to run fast searches over massive data sets without taxing your hardware

Approaches to Identifying Patterns in Outliers

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console

Are you a member of the Splunk Community?

Approaches to Identifying Patterns in Outliers

CX Day is Coming!

Strengthen Your Future: A Look Back at Splunk 10 Innovations and .conf25 Highlights!

Now Offering the AI Assistant Usage Dashboard in Cloud Monitoring Console