Other Usage

How to customize external function for normal distribution?

POR160893
Builder

Hi,

I am doing statistical analysis on a number of indexes for time series forecasting.

On reading the following article, its gives a sample SPL query as follows:
| gentimes start=”01/01/2018" increment=1h
| eval _time=starttime, loc=0, scale=20
| normal loc=loc scale=scale
| streamstats count as cnt
| eval gen_normal = gen_normal + cnt
| table _time, gen_normal
| rename gen_normal as “Non-stationary time series (trend)”

[Article is this: ]https://towardsdatascience.com/time-series-forecasting-with-splunk-part-i-intro-kalman-filter-46e4bf...

The "normal" command is a cutom external command and I wanted to ask how and where I can get such statistical functions into Splunk?


Many thanks as always,

0 Karma
1 Solution

tscroggins
Influencer

As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):

 

# macros.conf

[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1

[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1

[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1

 

 

View solution in original post

tscroggins
Influencer

Hi,

I read the same article several years ago and created a macros similar to Excel's NORMINV and RAND just for this purpose:

 

# macros.conf

[norminv(3)]
args = p,u,s
definition = "exact($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0))))"
iseval = 1

[rand]
definition = "random()/2147483647"
iseval = 1

 

There's further discussion of the macro implementation in a previous post: https://community.splunk.com/t5/Splunk-Search/Outlier-Dip-Trough-Detection/m-p/550122/highlight/true... 

To recreate the toy example from the original article:

| gentimes start="01/01/2018" end="01/22/2018" increment=1h 
| eval _time=starttime, loc=0, scale=20 
| eval gen_normal=`norminv("`rand()`", loc, scale)` 
| streamstats count as cnt 
| eval gen_normal=gen_normal+cnt
| table _time gen_normal
| rename gen_normal as "Non-stationary time series (trend)"
| predict algorithm=LLT future_timespan=200 "Non-stationary time series (trend)"
0 Karma

tscroggins
Influencer

As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):

 

# macros.conf

[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1

[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1

[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1

 

 

POR160893
Builder

Perfect! Thank you so much for this information.

Get Updates on the Splunk Community!

Announcing the Expansion of the Splunk Academic Alliance Program

The Splunk Community is more than just an online forum — it’s a network of passionate users, administrators, ...

Learn Splunk Insider Insights, Do More With Gen AI, & Find 20+ New Use Cases You Can ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Buttercup Games: Further Dashboarding Techniques (Part 7)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...