Splunk Search

How to use outlier command?

jeelong
Explorer

In Splunk documentation for the outlier command, it say:

" The transform option truncates the outlying values to the threshold for outliers."

Would like to understand how it calculates the threshold mentioned above. 

For this SPL below, the total_bytes value of 92000, is replaced with 000244. How does Splunk come up with the value of 244?

 

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data | eval splitted = split(data,",") | eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date| outlier action=transform mark=true total_bytes | rename total_bytes as transform_total_bytes

 

 

Labels (1)
Tags (1)
0 Karma
1 Solution

ITWhisperer
SplunkTrust
SplunkTrust

It looks like this is based on the interquartile range (note param option - https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Outlier...)

You can validate this with this example

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data 
| eval splitted = split(data,",") 
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| eventstats perc25(total_bytes) as p25 perc75(total_bytes) as p75
| eval iqr=p75-p25
| eval upper=p75+(iqr*1.5)
| outlier action=transform mark=true total_bytes

View solution in original post

ITWhisperer
SplunkTrust
SplunkTrust

It looks like this is based on the interquartile range (note param option - https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Outlier...)

You can validate this with this example

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data 
| eval splitted = split(data,",") 
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| eventstats perc25(total_bytes) as p25 perc75(total_bytes) as p75
| eval iqr=p75-p25
| eval upper=p75+(iqr*1.5)
| outlier action=transform mark=true total_bytes

jeelong
Explorer

Thanks alot ITWhisperer. You have increased my understanding a great deal. 

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data
| eval splitted = split(data,",")
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| eventstats perc25(total_bytes) as p25 perc75(total_bytes) as p75
| eval iqr=p75-p25
| eval lower=p25-(iqr*1.5)
| eval upper=p75+(iqr*1.5)
| outlier action=transform param=3 mark=true total_bytes

I am still not sure on the results from outlier though.

Given the above, why are the 2 rows with a value of "3" not flagged as an outlier? I would have thought they would be replaced with "174".

Also, if I put in a param of 3, to override the default of 2.5, how does Splunk come up with the number of "250"  to replace the "92000"?

 

 

 

Tags (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

3 is not flagged because you haven't used uselower=t - it defaults to uselower=f

As for why splunk is picking the values it is, to be honest, I don't know - I just found a relationship that worked for your first example.

Personally, if I don't know how something works, I don't usually use it. For all we know, there might be a bug in the calculation - there certainly something that we are missing.

So, my question to you is, why are you using action=transform?

What do you see the value in transforming the outliers rather than just removing them?

Given that we have our own method of generating a replacement value (albeit a different one to that used by splunk except in one instance), why not use something that is known (that's what I would do until I understood what splunk is doing)? 😀

0 Karma

jeelong
Explorer

Thanks ITWhisperer.

I have been finding outliers using the that p25 and p75 function to date. 

Had created some fairly complex SPL to get the outlier, remove them, and create a baseline for current comparison. 

This is so we could find spikes in current data compared to a baseline created from previous 365 days.

It works, mostly. But much room for improvement. To this end I have begun looking at Splunk MLTK to see if I could get better results from it. 

I will be diving into "anomalydetection" and "persist" for instance. As I am not a data scientist, I will no doubt be winging it to a large extent. I did want to understand as much as possible what these are doing under the hood. But knew I would have to "trust in the force" to some extent. 

If I cannot easily decipher what the outlier command is returning then it is not a good sign for when I dive deeper into MLTK. 😥

Oh well. Crash or crash through as they say. 😀 Thanks again for your insights. 

 

 

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @jeelong,

did you tried the outlier command without options?

| makeresults
| fields - _time
| eval data="101,20220101,3;101,20220102,200;101,20220103,210;101,20220104,220;101,20220105,200;101,20220106,210;101,20220107,220;101,20220108,92000;101,20220109,200;101,20220110,3;"
| makemv delim=";" data
| mvexpand data 
| eval splitted = split(data,",") 
| eval day_hour_key=mvindex(splitted,0,0), date=mvindex(splitted,1,1) , total_bytes=mvindex(splitted,2,2)
| fields day_hour_key,total_bytes,date
| outlier  
| rename total_bytes as transform_total_bytes

Ciao.

Giuseppe

0 Karma
Get Updates on the Splunk Community!

Introducing New Splunkbase Governance!

Splunk apps are essential for maximizing the value of your Splunk Experience. Whether you’re using the default ...

Splunk Edge Processor | Popular Use Cases to Get Started with Edge Processor

Splunk Edge Processor offers more efficient, flexible data transformation – helping you reduce noise, control ...

3 Ways to Make OpenTelemetry Even Better

My role as an Observability Specialist at Splunk provides me with the opportunity to work with customers of ...