Hi
I am using the MLTK.
I have a question about the usecase "Detect Numeric Outliers". Specifically line #4.
Why is it important when detecting outliers? I have plotted 2 graphs. Graph 1 uses line #4 and Graph 2 does not.
For me it seems that Graph 2 is the most accurate because it shows the forecast (future_timespan=172) form 30 Nov to 4 Dec. Meanwhile the other one just eliminates those days (it only shows up tp 30 Nov).
1. | inputlookup logins.csv
2. | predict logins as prediction algorithm=LLP future_timespan=172 holdback=36
3. | eval residual = prediction - logins
4. | where prediction!="" AND logins!=""
5. | table _time, logins prediction residual
USING: where prediction!="" AND logins!=""
WITHOUT: where prediction!="" AND logins!=""
You can't really tell which graph is accurate based on forecast timespan.
The only reason why the 1st one is not showing the forecast data because of | where prediction!="" AND logins!=""
, the logins will always be null in the feature.
`| where prediction!="" AND logins!=""` with this statement what you're really doing is eliminating the null value of logins and prediction, I'm not sure that's what you wanted.
Hope this helps. Thanks!
You can't really tell which graph is accurate based on forecast timespan.
The only reason why the 1st one is not showing the forecast data because of | where prediction!="" AND logins!=""
, the logins will always be null in the feature.
`| where prediction!="" AND logins!=""` with this statement what you're really doing is eliminating the null value of logins and prediction, I'm not sure that's what you wanted.
Hope this helps. Thanks!
Hi
Yes, that line is to avoid nulls. And it should be in position 3, NOT 4. Because "eval" does not function if there are null values. I have used it in other use case.
In this use case, I prefer to remove it because I want to see the "forecast".
1. | inputlookup logins.csv
2. | predict logins as prediction algorithm=LLP future_timespan=172 holdback=36
3. | where prediction!="" AND logins!=""
4. | eval residual = prediction - logins
5. | table _time, logins prediction residual
Cool! Thanks