Other Usage

How to customize external function for normal distribution?

POR160893
Builder

Hi,

I am doing statistical analysis on a number of indexes for time series forecasting.

On reading the following article, its gives a sample SPL query as follows:
| gentimes start=”01/01/2018" increment=1h
| eval _time=starttime, loc=0, scale=20
| normal loc=loc scale=scale
| streamstats count as cnt
| eval gen_normal = gen_normal + cnt
| table _time, gen_normal
| rename gen_normal as “Non-stationary time series (trend)”

[Article is this: ]https://towardsdatascience.com/time-series-forecasting-with-splunk-part-i-intro-kalman-filter-46e4bf...

The "normal" command is a cutom external command and I wanted to ask how and where I can get such statistical functions into Splunk?


Many thanks as always,

0 Karma
1 Solution

tscroggins
Influencer

As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):

 

# macros.conf

[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1

[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1

[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1

 

 

View solution in original post

tscroggins
Influencer

Hi,

I read the same article several years ago and created a macros similar to Excel's NORMINV and RAND just for this purpose:

 

# macros.conf

[norminv(3)]
args = p,u,s
definition = "exact($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0))))"
iseval = 1

[rand]
definition = "random()/2147483647"
iseval = 1

 

There's further discussion of the macro implementation in a previous post: https://community.splunk.com/t5/Splunk-Search/Outlier-Dip-Trough-Detection/m-p/550122/highlight/true... 

To recreate the toy example from the original article:

| gentimes start="01/01/2018" end="01/22/2018" increment=1h 
| eval _time=starttime, loc=0, scale=20 
| eval gen_normal=`norminv("`rand()`", loc, scale)` 
| streamstats count as cnt 
| eval gen_normal=gen_normal+cnt
| table _time gen_normal
| rename gen_normal as "Non-stationary time series (trend)"
| predict algorithm=LLT future_timespan=200 "Non-stationary time series (trend)"
0 Karma

tscroggins
Influencer

As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):

 

# macros.conf

[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1

[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1

[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1

 

 

POR160893
Builder

Perfect! Thank you so much for this information.

Get Updates on the Splunk Community!

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...