Other Usage

How to customize external function for normal distribution?

POR160893
Builder

Hi,

I am doing statistical analysis on a number of indexes for time series forecasting.

On reading the following article, its gives a sample SPL query as follows:
| gentimes start=”01/01/2018" increment=1h
| eval _time=starttime, loc=0, scale=20
| normal loc=loc scale=scale
| streamstats count as cnt
| eval gen_normal = gen_normal + cnt
| table _time, gen_normal
| rename gen_normal as “Non-stationary time series (trend)”

[Article is this: ]https://towardsdatascience.com/time-series-forecasting-with-splunk-part-i-intro-kalman-filter-46e4bf...

The "normal" command is a cutom external command and I wanted to ask how and where I can get such statistical functions into Splunk?


Many thanks as always,

0 Karma
1 Solution

tscroggins
Influencer

As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):

 

# macros.conf

[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1

[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1

[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1

 

 

View solution in original post

tscroggins
Influencer

Hi,

I read the same article several years ago and created a macros similar to Excel's NORMINV and RAND just for this purpose:

 

# macros.conf

[norminv(3)]
args = p,u,s
definition = "exact($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0))))"
iseval = 1

[rand]
definition = "random()/2147483647"
iseval = 1

 

There's further discussion of the macro implementation in a previous post: https://community.splunk.com/t5/Splunk-Search/Outlier-Dip-Trough-Detection/m-p/550122/highlight/true... 

To recreate the toy example from the original article:

| gentimes start="01/01/2018" end="01/22/2018" increment=1h 
| eval _time=starttime, loc=0, scale=20 
| eval gen_normal=`norminv("`rand()`", loc, scale)` 
| streamstats count as cnt 
| eval gen_normal=gen_normal+cnt
| table _time gen_normal
| rename gen_normal as "Non-stationary time series (trend)"
| predict algorithm=LLT future_timespan=200 "Non-stationary time series (trend)"
0 Karma

tscroggins
Influencer

As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):

 

# macros.conf

[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1

[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1

[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1

 

 

POR160893
Builder

Perfect! Thank you so much for this information.

Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...