I'm currently trying to translate Splunk functions into SAS, and was hoping for some clarification on the prediction function.
Are the algorithms Splunk uses for the prediction models proprietary? If not, is there any further documentation/explanation concerning the predict function algorithms Splunk uses? We are hoping to replicate the predict function analysis from Splunk in SAS, and want to be sure we fully understand the step-by-step calculations Splunk uses as we do so.
How do we interpret the following prediction algorithms: LL, LLP, LLT, LLB?
How do we interpret the lower 95 and upper 95 (prediction count)?
Can you please give us a real-world example where “malware” fell above or below the predicted value’s confidence interval based on the dataset used and the time series model utilized?
All the algorithms are based on the Kalman filter which is not proprietary. However, some of the variations we come up with are proprietary. Explanation of the Kalman filter can be found in the outside literature. One of the books I found useful while implementing the predict command is "An Introduction to State Space Time Series Analysis" by Commandeur-Koopman.
Local Level (LL): this a univariate model with no trends and no seasonaility. Seasonal Local Level (LLP): this is a univariate model with seasonality. The periodicity of the time series is automatically computed.
Local Level Trend (LLT): this is a univariate model with trend but no seasonality.
Bivariate Local Level (LLB): this is a bivariate model with no trends and no seasonality.
The lower 95 and upper 95 specifies a confidence interval in which we expect 95% of the predictions to fall.
Most of the time SOME of the predictions will fall outside the confidence interval. That is normal because 1. the confidence interval does not cover 100% of the predictions and 2. the confidence interval is about a probabilistic expectation and things don't match the expectation exactly.