Splunk Search

How to use outlier command?

jeelong
Explorer

In Splunk documentation for the outlier command, it say:

" The transform option truncates the outlying values to the threshold for outliers."

Would like to understand how it calculates the threshold mentioned above. 

For this SPL below, the total_bytes value of 92000, is replaced with 000244. How does Splunk come up with the value of 244?

 

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data | eval splitted = split(data,",") | eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date| outlier action=transform mark=true total_bytes | rename total_bytes as transform_total_bytes

 

 

Labels (1)
Tags (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

It looks like this is based on the interquartile range (note param option - https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Outlier...)

You can validate this with this example

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data 
| eval splitted = split(data,",") 
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| eventstats perc25(total_bytes) as p25 perc75(total_bytes) as p75
| eval iqr=p75-p25
| eval upper=p75+(iqr*1.5)
| outlier action=transform mark=true total_bytes

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

It looks like this is based on the interquartile range (note param option - https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Outlier...)

You can validate this with this example

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data 
| eval splitted = split(data,",") 
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| eventstats perc25(total_bytes) as p25 perc75(total_bytes) as p75
| eval iqr=p75-p25
| eval upper=p75+(iqr*1.5)
| outlier action=transform mark=true total_bytes

jeelong
Explorer

Thanks alot ITWhisperer. You have increased my understanding a great deal. 

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data
| eval splitted = split(data,",")
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| eventstats perc25(total_bytes) as p25 perc75(total_bytes) as p75
| eval iqr=p75-p25
| eval lower=p25-(iqr*1.5)
| eval upper=p75+(iqr*1.5)
| outlier action=transform param=3 mark=true total_bytes

I am still not sure on the results from outlier though.

Given the above, why are the 2 rows with a value of "3" not flagged as an outlier? I would have thought they would be replaced with "174".

Also, if I put in a param of 3, to override the default of 2.5, how does Splunk come up with the number of "250"  to replace the "92000"?

 

 

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

3 is not flagged because you haven't used uselower=t - it defaults to uselower=f

As for why splunk is picking the values it is, to be honest, I don't know - I just found a relationship that worked for your first example.

Personally, if I don't know how something works, I don't usually use it. For all we know, there might be a bug in the calculation - there certainly something that we are missing.

So, my question to you is, why are you using action=transform?

What do you see the value in transforming the outliers rather than just removing them?

Given that we have our own method of generating a replacement value (albeit a different one to that used by splunk except in one instance), why not use something that is known (that's what I would do until I understood what splunk is doing)? 😀

0 Karma

jeelong
Explorer

Thanks ITWhisperer.

I have been finding outliers using the that p25 and p75 function to date. 

Had created some fairly complex SPL to get the outlier, remove them, and create a baseline for current comparison. 

This is so we could find spikes in current data compared to a baseline created from previous 365 days.

It works, mostly. But much room for improvement. To this end I have begun looking at Splunk MLTK to see if I could get better results from it. 

I will be diving into "anomalydetection" and "persist" for instance. As I am not a data scientist, I will no doubt be winging it to a large extent. I did want to understand as much as possible what these are doing under the hood. But knew I would have to "trust in the force" to some extent. 

If I cannot easily decipher what the outlier command is returning then it is not a good sign for when I dive deeper into MLTK. 😥

Oh well. Crash or crash through as they say. 😀 Thanks again for your insights. 

 

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jeelong,

did you tried the outlier command without options?

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data 
| eval splitted = split(data,",") 
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| outlier  
| rename total_bytes as transform_total_bytes

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In September, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...

New in Observability - Improvements to Custom Metrics SLOs, Log Observer Connect & ...

The latest enhancements to the Splunk observability portfolio deliver improved SLO management accuracy, better ...

Improve Data Pipelines Using Splunk Data Management

  Register Now   This Tech Talk will explore the pipeline management offerings Edge Processor and Ingest ...