I am training and evaluating a forecast model using MLTK's StateSpaceForecast. I would like to fit on part of the dataset, and have a held back testing set to evaluate. The trick, though, is that I want the forecaster to forecast out 15 minutes in the future while autoregressively looking at the current feature values.
For example, take my query that tries to find the TPR, FPR, etc. for exceeding some SLA violation using my holdout set. Currently, it just uses the beginning of the holdout set to predict out 2 hours.
| fit StateSpaceForecast latency_p95_log from latency_p95_log, threadcount_p95, threadcount, total_socket_errors, n_running_procs, time_wait_cpu, HourOfDay, DayOfWeek holdback=2h forecast_k=15m conf_interval=95 into ml_latency_forecast
| apply ml_latency_forecast forecast_k=2h holdback=2h
| eval predicted = exp('predicted(latency_p95_log)')
| eval predicted_low=exp('lower95(predicted(latency_p95_log))'), predicted_high=exp('upper95(predicted(latency_p95_log))')
| eval predicted_SLA = if(predicted > 1.0, 1, 0)
| eval true_positive = if(predicted_SLA=1 AND SLA_violation=1, 1, 0)
| eval false_positive = if(predicted_SLA=1 AND SLA_violation=0, 1, 0)
| eval true_negative = if(predicted_SLA=0 AND SLA_violation=0, 1, 0)
| eval false_negative = if(predicted_SLA=0 AND SLA_violation=1, 1, 0)
| eval holdout = if(isnull('lower95(predicted(latency_p95_log))'), 0, 1)
| table _time predicted predicted_high predicted_low latency_p95
Is there any examples someone can help give me for doing a forecast and evaluating the fit on on-seen data during training?
Splunk MLTK Algorithms on GitHub
Hi @rfdickerson,
The Python source code for Splunk's implementation of StateSpaceForecast is collectively in:
$SPLUNK_HOMEetc/apps/Splunk_ML_Toolkit/bin/algos/StateSpaceForecast.py
$SPLUNK_HOMEetc/apps/Splunk_ML_Toolkit/bin/algos_support/statespace/*
The StateSpaceForecast algorithm is similar to the Splunk predict command.
If you're not managing your own Splunk instance, you can download the MLTK archive from Splunkbase and inspect the files directly.
The holdback and forecast_k parameters function as described. You may want to look at the partial_fit parameter for more control over the window of data used to update your model dynamically before using apply and (eventually) calculating TPR and FPR.