We have a case of a delay of an hour for a certain index that happened last week, while the indexing delays are normally up to half a minute. I'm struggling with the parameters for the MLTK to capture these specific cases as outliers. Any ideas how to set it up correctly? It’s the tolerance that seems to be affected by the spike itself.
You mention the tolerance being influenced by the spike that has occured itself. Are you fitting your algorithm on data which includes the intended outlier? Using only data you consider normal to fit the function would likely solve your issue here. The same goes for continuous re-training via partial_fit; use this only after all new data has been predicted using the old model state.
If this is not the issue here, some more information regarding what MLTK algorithm you are planning to use, your current parameter setup and what data you are using for your train/test split might give a better idea as to the root cause of your issue.
@ljvc Thank you for the direction