Hi,
I am doing statistical analysis on a number of indexes for time series forecasting.
On reading the following article, its gives a sample SPL query as follows:
| gentimes start=”01/01/2018" increment=1h
| eval _time=starttime, loc=0, scale=20
| normal loc=loc scale=scale
| streamstats count as cnt
| eval gen_normal = gen_normal + cnt
| table _time, gen_normal
| rename gen_normal as “Non-stationary time series (trend)”
[Article is this: ]https://towardsdatascience.com/time-series-forecasting-with-splunk-part-i-intro-kalman-filter-46e4bf...
The "normal" command is a cutom external command and I wanted to ask how and where I can get such statistical functions into Splunk?
Many thanks as always,
As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):
# macros.conf
[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1
[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1
[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1
Hi,
I read the same article several years ago and created a macros similar to Excel's NORMINV and RAND just for this purpose:
# macros.conf
[norminv(3)]
args = p,u,s
definition = "exact($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0))))"
iseval = 1
[rand]
definition = "random()/2147483647"
iseval = 1
There's further discussion of the macro implementation in a previous post: https://community.splunk.com/t5/Splunk-Search/Outlier-Dip-Trough-Detection/m-p/550122/highlight/true...
To recreate the toy example from the original article:
| gentimes start="01/01/2018" end="01/22/2018" increment=1h
| eval _time=starttime, loc=0, scale=20
| eval gen_normal=`norminv("`rand()`", loc, scale)`
| streamstats count as cnt
| eval gen_normal=gen_normal+cnt
| table _time gen_normal
| rename gen_normal as "Non-stationary time series (trend)"
| predict algorithm=LLT future_timespan=200 "Non-stationary time series (trend)"
As an aside, I don't know of any generally available statistical package for Splunk that contains generating commands for commonly used distributions. I write macros as needed. For example (with no guarantee of correctness!):
# macros.conf
[expinv(2)]
args = p,b
definition = "exact(-(1/$b$)*ln(1-$p$))"
iseval = 1
[lognorminv(3)]
args = p,u,s
definition = "exact(exp($u$ + $s$ * if($p$ < 0.5, -1 * (sqrt(-2.0 * ln($p$)) - ((0.010328 * sqrt(-2.0 * ln($p$)) + 0.802853) * sqrt(-2.0 * ln($p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln($p$)) + 0.189269) * sqrt(-2.0 * ln($p$)) + 1.432788) * sqrt(-2.0 * ln($p$)) + 1.0)), (sqrt(-2.0 * ln(1 - $p$)) - ((0.010328 * sqrt(-2.0 * ln(1 - $p$)) + 0.802853) * sqrt(-2.0 * ln(1 - $p$)) + 2.515517) / (((0.001308 * sqrt(-2.0 * ln(1 - $p$)) + 0.189269) * sqrt(-2.0 * ln(1 - $p$)) + 1.432788) * sqrt(-2.0 * ln(1 - $p$)) + 1.0)))))"
iseval = 1
[weibullinv(3)]
args = p,a,b
definition = "exact($a$*pow(-ln(1-$p$),1/$b$))"
iseval = 1
Perfect! Thank you so much for this information.