Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Splunk Search

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community
- :
- Splunk Answers
- :
- Using Splunk
- :
- Splunk Search
- :
- Cannot reproduce Predict command confidence interv...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Cannot reproduce Predict command confidence interval

jaideeplamba

Explorer

05-01-2019
10:18 PM

Dear Team,

I understand we are using Kalman filters in predict command. I am comparing our existing Kalman implementation in python code (uses filterpy) to Splunk. Interestingly kalman means (predictions) are similar in Splunk and Python but confidence interval are way apart. Your help is really appreciated. This is a roadblock for us to move entirely to Splunk.

My Splunk query:

|inputlookup sample*predict*data.csv

|fields *time,Response*Time

|eval *time=strptime(*time, "%Y-%m-%dT%H:%M:%S.%3N%:z")

|convert num(*span)
|fields _time,Response*Time

|timechart span=5m avg(Response

|predict "Response

Sample*predict*data.csv:

Itr

0 59.040042

1 66.725715

2 40.399476

3 52.249948

4 48.609610

5 40.946166

6 52.468450

7 61.404242

8 35.637950

9 59.458336

10 40.836213

Sample output from Splunk:

*time Response*Time upper95(prediction(Response_Time))

1 2019-04-11 02:30:00-05:00 66.725715 73.969822

2 2019-04-11 02:35:00-05:00 40.399476 63.764238

3 2019-04-11 02:40:00-05:00 52.249948 63.185333

4 2019-04-11 02:45:00-05:00 48.609610 61.507178

5 2019-04-11 02:50:00-05:00 40.946166 57.659094

6 2019-04-11 02:55:00-05:00 52.468450 59.473529

7 2019-04-11 03:00:00-05:00 61.404242 63.887708

8 2019-04-11 03:05:00-05:00 35.637950 57.279082

9 2019-04-11 03:10:00-05:00 59.458336 61.778312

Sample Output from Python

Prediction . PythonUpperBound

59.083758 64.242979

52.332217 49.047618

53.685847 48.838873

53.086523 47.206724

48.965072 42.431759

51.753461 46.112887

54.099557 53.597780

52.231218 45.280999

50.658427 51.773081

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

nnguyen_splunk

Splunk Employee

05-02-2019
09:39 AM

Hello,

Since you set the period=96, you may want to use algorithm=LLP5. THe 'LL' algorithm is for non-periodic data.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

Sukisen1981

Champion

05-02-2019
10:13 AM

The key point and from splunk docs - All the algorithms are variations based on the Kalman filter

https://docs.splunk.com/Documentation/Splunk/7.2.6/SearchReference/Predict

So, in essence there are bound to be differences when you compare this with an exact kalman implementation from python. If , for instance you would have used linear regression or k-means clustering for other use cases, the outputs would have tallied exactly with the python libraries.

Depends upon your use case but within the limitations of the kalman filter in general, splunk does a pretty good job of implementing the same.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

jaideeplamba

Explorer

05-02-2019
12:24 PM

Thank you for your reply. I am using non-seasonal data for validation and hence selected LL.

I can understand that there is some difference between Kalman in Python (filterpy library) vs in Splunk. But predicted means are around the same values in both cases. Confidence interval are significantly apart which is puzzling me. My understanding is upper95 bound should be mean+1.96*variance. If means are similar then variance is different in two algorithms. More details around that are hight appreciated.

Regards

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

Sukisen1981

Champion

05-02-2019
12:33 PM

The numbers are very close to each other in your snap and its difficult to understand

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

jaideeplamba

Explorer

05-02-2019
12:57 PM

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

nnguyen_splunk

Splunk Employee

05-02-2019
01:01 PM

Hmm, not sure why the Python Upper95 curve (blue one) is below the Python prediction (yellow)?

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

jaideeplamba

Explorer

05-02-2019
01:36 PM

Sorry for the typo. Reran the simulation and here are the results.

*time Splunk*RT Splunk*RT*Upper95 Python*RT Python*RT*Upper95 Actual*RT

1 29 39 29 30 27

2 30 41 31 32 33

3 30 41 31 32 30

4 29 39 29 30 26

5 31 41 31 32 33

6 31 41 32 33 32

7 30 40 30 32 28

8 32 42 32 34 35

9 32 42 33 34 32

10 31 41 31 32 28

p.s.: Cant upload any more screenshots.

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

nnguyen_splunk

Splunk Employee

05-02-2019
02:46 PM

*RT that are greater than Python*RT_Upper95. That's not good for a 95% confidence interval.

Highlighted
##

You are right in your count. But it is a small subset of data to illustrate the point. Is it possible to share some more details on the underlying calculations behind mean, variance and confidence interval for LL algorithm specifically.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Cannot reproduce Predict command confidence interval

jaideeplamba

Explorer

05-02-2019
08:45 PM