<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Cannot reproduce Predict command confidence interval in Splunk Search</title>
    <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436957#M124471</link>
    <description>&lt;P&gt;Thank you for your reply. I am using non-seasonal data for validation and hence selected LL. &lt;/P&gt;

&lt;P&gt;I can understand that there is some difference between Kalman in Python (filterpy library) vs in Splunk. But predicted means are around the same values in both cases. Confidence interval are significantly apart which is puzzling me. My understanding is upper95 bound should be mean+1.96*variance. If means are similar then variance is different in two algorithms. More details around that are hight appreciated.&lt;/P&gt;

&lt;P&gt;Regards&lt;/P&gt;</description>
    <pubDate>Thu, 02 May 2019 19:24:42 GMT</pubDate>
    <dc:creator>jaideeplamba</dc:creator>
    <dc:date>2019-05-02T19:24:42Z</dc:date>
    <item>
      <title>Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436954#M124468</link>
      <description>&lt;P&gt;Dear Team,&lt;/P&gt;

&lt;P&gt;I understand we are using Kalman filters in predict command. I am comparing our existing Kalman implementation in python code (uses filterpy) to Splunk. Interestingly kalman means (predictions) are similar in Splunk and Python but confidence interval are way apart. Your help is really appreciated. This is a roadblock for us to move entirely to Splunk.&lt;/P&gt;

&lt;P&gt;My Splunk query:&lt;BR /&gt;
|inputlookup sample_predict_data.csv &lt;BR /&gt;
|fields _time,Response_Time &lt;BR /&gt;
|eval _time=strptime(_time, "%Y-%m-%dT%H:%M:%S.%3N%:z")&lt;BR /&gt;
|convert num(_span)&lt;BR /&gt;
|fields _time,Response_Time&lt;BR /&gt;
|timechart span=5m avg(Response_Time) as "Response_Time"&lt;BR /&gt;
|predict "Response_Time" future_timespan=4 period=96 algorithm=LL&lt;/P&gt;

&lt;P&gt;Sample_predict_data.csv:&lt;BR /&gt;
Itr&lt;BR /&gt;
0     59.040042&lt;BR /&gt;
1     66.725715&lt;BR /&gt;
2     40.399476&lt;BR /&gt;
3     52.249948&lt;BR /&gt;
4     48.609610&lt;BR /&gt;
5     40.946166&lt;BR /&gt;
6     52.468450&lt;BR /&gt;
7     61.404242&lt;BR /&gt;
8     35.637950&lt;BR /&gt;
9     59.458336&lt;BR /&gt;
10    40.836213&lt;/P&gt;

&lt;P&gt;Sample output from Splunk:&lt;BR /&gt;
   _time  Response_Time  upper95(prediction(Response_Time))&lt;BR /&gt;
1 2019-04-11 02:30:00-05:00      66.725715                           73.969822&lt;BR /&gt;
2 2019-04-11 02:35:00-05:00      40.399476                           63.764238&lt;BR /&gt;
3 2019-04-11 02:40:00-05:00      52.249948                           63.185333&lt;BR /&gt;
4 2019-04-11 02:45:00-05:00      48.609610                           61.507178&lt;BR /&gt;
5 2019-04-11 02:50:00-05:00      40.946166                           57.659094&lt;BR /&gt;
6 2019-04-11 02:55:00-05:00      52.468450                           59.473529&lt;BR /&gt;
7 2019-04-11 03:00:00-05:00      61.404242                           63.887708&lt;BR /&gt;
8 2019-04-11 03:05:00-05:00      35.637950                           57.279082&lt;BR /&gt;
9 2019-04-11 03:10:00-05:00      59.458336                           61.778312&lt;/P&gt;

&lt;P&gt;Sample Output from Python&lt;BR /&gt;
 Prediction . PythonUpperBound&lt;BR /&gt;&lt;BR /&gt;
59.083758  64.242979&lt;BR /&gt;
52.332217  49.047618&lt;BR /&gt;
53.685847  48.838873&lt;BR /&gt;
53.086523  47.206724&lt;BR /&gt;
48.965072  42.431759&lt;BR /&gt;
51.753461  46.112887&lt;BR /&gt;
54.099557  53.597780&lt;BR /&gt;
52.231218  45.280999&lt;BR /&gt;
50.658427  51.773081&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 00:19:45 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436954#M124468</guid>
      <dc:creator>jaideeplamba</dc:creator>
      <dc:date>2020-09-30T00:19:45Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436955#M124469</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;Since you set the period=96, you may want to use algorithm=LLP5. THe 'LL' algorithm is for non-periodic data.&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 16:39:24 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436955#M124469</guid>
      <dc:creator>nnguyen_splunk</dc:creator>
      <dc:date>2019-05-02T16:39:24Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436956#M124470</link>
      <description>&lt;P&gt;The key point and from splunk docs - All the algorithms are variations based on the Kalman filter&lt;BR /&gt;
&lt;A href="https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Predict"&gt;https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Predict&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;So, in essence there are bound to be differences when you compare this with an exact kalman implementation from python. If , for instance you would have used linear regression or k-means clustering for other use cases, the outputs would have tallied exactly with the python libraries.&lt;BR /&gt;
Depends upon your use case but within the limitations of the kalman filter in general, splunk does a pretty good job of implementing the same.&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 17:13:54 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436956#M124470</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2019-05-02T17:13:54Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436957#M124471</link>
      <description>&lt;P&gt;Thank you for your reply. I am using non-seasonal data for validation and hence selected LL. &lt;/P&gt;

&lt;P&gt;I can understand that there is some difference between Kalman in Python (filterpy library) vs in Splunk. But predicted means are around the same values in both cases. Confidence interval are significantly apart which is puzzling me. My understanding is upper95 bound should be mean+1.96*variance. If means are similar then variance is different in two algorithms. More details around that are hight appreciated.&lt;/P&gt;

&lt;P&gt;Regards&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 19:24:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436957#M124471</guid>
      <dc:creator>jaideeplamba</dc:creator>
      <dc:date>2019-05-02T19:24:42Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436958#M124472</link>
      <description>&lt;P&gt;Having a bit of a difficulty in understanding the outputs from splunk and python, you say the predictions are quite simlar. Can you highlight from the output snippets what is simlar and what is widely divergent?&lt;BR /&gt;
The numbers are very close to each other in your snap and its difficult to understand&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 19:33:48 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436958#M124472</guid>
      <dc:creator>Sukisen1981</dc:creator>
      <dc:date>2019-05-02T19:33:48Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436959#M124473</link>
      <description>&lt;P&gt;I apologize for the confusion. Actually both the algorithms start similarly and then deviate. So here is a sample from the data where they have deviated significantly. I have attached a screenshot to explain better.&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="alt text"&gt;&lt;img src="https://community.splunk.com/t5/image/serverpage/image-id/6969i637C0E7378F817BC/image-size/large?v=v2&amp;amp;px=999" role="button" title="alt text" alt="alt text" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 19:57:26 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436959#M124473</guid>
      <dc:creator>jaideeplamba</dc:creator>
      <dc:date>2019-05-02T19:57:26Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436960#M124474</link>
      <description>&lt;P&gt;Hmm, not sure why the Python Upper95 curve (blue one) is below the Python prediction (yellow)? &lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 20:01:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436960#M124474</guid>
      <dc:creator>nnguyen_splunk</dc:creator>
      <dc:date>2019-05-02T20:01:25Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436961#M124475</link>
      <description>&lt;P&gt;Sorry for the typo. Reran the simulation and here are the results.&lt;/P&gt;

&lt;P&gt;_time   Splunk_RT   Splunk_RT_Upper95   Python_RT   Python_RT_Upper95   Actual_RT&lt;BR /&gt;
1   29  39  29  30  27&lt;BR /&gt;
2   30  41  31  32  33&lt;BR /&gt;
3   30  41  31  32  30&lt;BR /&gt;
4   29  39  29  30  26&lt;BR /&gt;
5   31  41  31  32  33&lt;BR /&gt;
6   31  41  32  33  32&lt;BR /&gt;
7   30  40  30  32  28&lt;BR /&gt;
8   32  42  32  34  35&lt;BR /&gt;
9   32  42  33  34  32&lt;BR /&gt;
10  31  41  31  32  28&lt;/P&gt;

&lt;P&gt;p.s.: Cant upload any more screenshots.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 00:20:25 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436961#M124475</guid>
      <dc:creator>jaideeplamba</dc:creator>
      <dc:date>2020-09-30T00:20:25Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436962#M124476</link>
      <description>&lt;P&gt;This is too little data for me to really judge, but I counted 3 out of 10 Actual_RT that are greater than Python_RT_Upper95. That's not good for a 95% confidence interval.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Sep 2020 00:20:31 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436962#M124476</guid>
      <dc:creator>nnguyen_splunk</dc:creator>
      <dc:date>2020-09-30T00:20:31Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436963#M124477</link>
      <description>&lt;P&gt;Do you think you can realistically predict response time? Is there a known pattern to learn from? In my experience, you will have better luck setting this up as a mutli-variant model using multiple input variables to get your target value&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 23:11:23 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436963#M124477</guid>
      <dc:creator>skoelpin</dc:creator>
      <dc:date>2019-05-02T23:11:23Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436964#M124478</link>
      <description>&lt;P&gt;This is sample dataset from the original dataset. Objective is not to predict responseTime but to detect anomaly and define confidence band. Bigger ask is to decide between splunk or python algorithm to move forward. &lt;BR /&gt;
Hope that helps.&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2019 03:38:42 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436964#M124478</guid>
      <dc:creator>jaideeplamba</dc:creator>
      <dc:date>2019-05-03T03:38:42Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436965#M124479</link>
      <description>&lt;P&gt;You are right in your count. But it is a small subset of data to illustrate the point. Is it possible to share some more details on the underlying calculations behind mean, variance and confidence interval for LL algorithm specifically. &lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2019 03:45:36 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436965#M124479</guid>
      <dc:creator>jaideeplamba</dc:creator>
      <dc:date>2019-05-03T03:45:36Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436966#M124480</link>
      <description>&lt;P&gt;The LL algorithm follows the book "Time Series Analysis by State Space Method" by Durbin and Koopman, chapter 2 "Local Level Method".&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2019 15:43:55 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436966#M124480</guid>
      <dc:creator>nnguyen_splunk</dc:creator>
      <dc:date>2019-05-03T15:43:55Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436967#M124481</link>
      <description>&lt;P&gt;Thank you for your help. I will look into the book and get back.&lt;/P&gt;</description>
      <pubDate>Fri, 03 May 2019 20:08:00 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436967#M124481</guid>
      <dc:creator>jaideeplamba</dc:creator>
      <dc:date>2019-05-03T20:08:00Z</dc:date>
    </item>
    <item>
      <title>Re: Cannot reproduce Predict command confidence interval</title>
      <link>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436968#M124482</link>
      <description>&lt;P&gt;You should create your own limits so you understand exactly how it works and can adjust it as needed.. You'll ned to knock the dust off those old statistics books and find the equations for the UCL and LCL and apply it here. I've posted about this a lot, here's an example &lt;/P&gt;

&lt;P&gt;This is the output of a query after the arguements have been passed into the macro, so some of the numerical values will represent a value you pass. These include, confidence interval &lt;/P&gt;

&lt;PRE&gt;&lt;CODE&gt;| eval upper=if((count &amp;gt; pred),count,pred), lower=if((count &amp;lt; pred),count,pred), lower=if((lower == 0),"",lower) 
| eventstats avg(count) AS pred, stdev(count) as pred_stdev, by time, customer 
| eval upper=if((upper &amp;gt; (pred + (1 * pred_stdev))),((pred_stdev * 0.5) + pred),upper), lower=if((lower &amp;lt; (pred - (1 * pred_stdev))),((pred_stdev * 0.5) + pred),lower) 
| stats avg(count) AS pred, stdev(upper) AS ustdev, stdev(lower) AS lstdev stdev(count) as stdev by time, customer 
| eval low=(pred - (lstdev * (exact(3.1622776601683795)))), low=if((low &amp;lt; 0),1,low), high=(pred + (ustdev * (exact(3.1622776601683795)))), _time=time 
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 06 May 2019 13:27:12 GMT</pubDate>
      <guid>https://community.splunk.com/t5/Splunk-Search/Cannot-reproduce-Predict-command-confidence-interval/m-p/436968#M124482</guid>
      <dc:creator>skoelpin</dc:creator>
      <dc:date>2019-05-06T13:27:12Z</dc:date>
    </item>
  </channel>
</rss>

