All Apps and Add-ons

What does : | where prediction!="" AND logins!="" do when detecting outliers?

rosho
Communicator

Hi

I am using the MLTK.
I have a question about the usecase "Detect Numeric Outliers". Specifically line #4.
Why is it important when detecting outliers? I have plotted 2 graphs. Graph 1 uses line #4 and Graph 2 does not.

For me it seems that Graph 2 is the most accurate because it shows the forecast (future_timespan=172) form 30 Nov to 4 Dec. Meanwhile the other one just eliminates those days (it only shows up tp 30 Nov).

1.  | inputlookup logins.csv 
2.  | predict logins as prediction algorithm=LLP future_timespan=172 holdback=36
3.  | eval residual = prediction - logins
4.  | where prediction!="" AND logins!="" 
5.  | table _time, logins prediction residual

USING: where prediction!="" AND logins!=""
USING:

WITHOUT: where prediction!="" AND logins!=""
WITHOUT:

0 Karma
1 Solution

sandeepmakkena
Contributor

You can't really tell which graph is accurate based on forecast timespan.

The only reason why the 1st one is not showing the forecast data because of | where prediction!="" AND logins!="" , the logins will always be null in the feature.

`| where prediction!="" AND logins!=""` with this statement what you're really doing is eliminating the null value of logins and prediction, I'm not sure that's what you wanted.

Hope this helps. Thanks!

View solution in original post

0 Karma

sandeepmakkena
Contributor

You can't really tell which graph is accurate based on forecast timespan.

The only reason why the 1st one is not showing the forecast data because of | where prediction!="" AND logins!="" , the logins will always be null in the feature.

`| where prediction!="" AND logins!=""` with this statement what you're really doing is eliminating the null value of logins and prediction, I'm not sure that's what you wanted.

Hope this helps. Thanks!

0 Karma

rosho
Communicator

Hi

Yes, that line is to avoid nulls. And it should be in position 3, NOT 4. Because "eval" does not function if there are null values. I have used it in other use case.
In this use case, I prefer to remove it because I want to see the "forecast".

 1.  | inputlookup logins.csv 
 2.  | predict logins as prediction algorithm=LLP future_timespan=172 holdback=36
 3.  | where prediction!="" AND logins!="" 
 4.  | eval residual = prediction - logins
 5.  | table _time, logins prediction residual
0 Karma

sandeepmakkena
Contributor

Cool! Thanks

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

Can’t Make It to Boston? Stream .conf25 and Learn with Haya Husain

Boston may be buzzing this September with Splunk University and .conf25, but you don’t have to pack a bag to ...

Splunk Lantern’s Guide to The Most Popular .conf25 Sessions

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Unlock What’s Next: The Splunk Cloud Platform at .conf25

In just a few days, Boston will be buzzing as the Splunk team and thousands of community members come together ...