I am trying to get a betterunderstanding of the predict function in splun 6.1.2
I have the below search
... | predict SGSN02KPR as predict1 future_timespan=10 holdback=0
Questions:
Clarifications:
Hi @HattrickNZ.
Hope this helps.
Hi @HattrickNZ.
Hope this helps.
tks @tlagatta_splunk very helpful
holdback =0
to use all values in my model but it does not seem to do that, it always seems to predict for values I already have, hence the yellow line overlapping the blue line above.Other observations
in your search the future timespans need to line up ... latest=+100d@d ... future_timespan=100
holdback also has something to do with the last date in your timespan e.g. if you use future_timespan=100
the last date will be 100 minus the holdback value....
Hi @HattrickNZ, glad it helped.
This is a feature, not a bug! You should always predict the past values, to calibrate the prediction and make sure it's doing what you expect it to do. In many cases, the first attempt will do a poor job of predicting the past, which means you have to tweak it to make things work (e.g., add more historical data or make the timespan finer, like change span=1mon to span=1w). If you only predict the future, you won't know if the prediction is bad or not until you have to make decisions on it, which is usually too late.
Unfortunately, simple linear regressions are not implemented in the core product right now. If you're looking for just linear trendlines, this community-wiki post on plotting a linear trendline might help. Keep in mind that the predict command implements a Kalman filter, so it's a pretty robust way to make temporal predictions.
tks again @tlagatta_splunk
so can i control how many past values it will predict for calibrartion? Is there a min defalult setting of the number of past values it will predict for calibration? And is this the holdback
or something else?
From a visual point of view it would be good to be able to do the calibration and then have the option to remove it also. but hey 🙂
By default, the predict command uses all past values to build a model of the timeseries (incl. best-fit curve and uncertainty envelope). The holdback argument allows you to leave recent points out of the training process.
If you have enough data points (1 time span = 1 data point), then the best-fit curve and uncertainty envelope should both track closely to the past data. If not, then add more historical data or choose a finer span.
I do not advise removing this, even for visualization purposes. If something in your data changes and the prediction loses its accuracy (e.g., some rare event occurs and severely changes the model), then you want to see that immediately. When you use the predict command to make decisions, you should do so based on both the past & future trendlines, rather than a mix of the raw data & the future trendline alone.
In terms of options, you can always use the search language to further manipulate the data. The following query will remove the prediction from rows where the count field is non-null. I can't prevent you from doing this, but I do strongly advise you against it 🙂
| foreach prediction [eval <>=if(isnotnull(count), null(), '<>')]
@tlagatta_splunk thanks very much for your help on this....
How do I not include todays value in the real values, because I am working on max values per day and if I run this search in the morning the max for today won't be hit til later today, so I would like to remove(not use) todays value? In fact I can do this using holdback=1. But that won't stop it showing in the graph. I wonder is there a way to remove this?
My search looks something like:
...earliest=-120d@d latest=+300d@d | timechart span=d max(KPI1) by DeviceName | predict Device1 as predict1 future_timespan=300 holdback=10