Solved: Is there a way to specify a different time range f...

kiamco · ‎08-09-2018

I know that the predict functions become more accurate when you feed it more data but I don't want to be querying 2 months worth of data in a dashboard that would take like 2 mins to load. Is there a way to get a more accurate prediction without actively querying the past 2 months? or is there a way to do this differently with a different function. FYI I d not have authority to download the MLTK

I know this is a tough question but would like to hear some ideas.

index=summary source="summary_events_2" 
orig_source=/var/log/pnr*
ms_region=us-west-1
ms_level=E*
| timechart span=15m  sum(count) as count 
| predict count as count_prediction period=7 algorithm=LLP5 future_timespan=10 holdback=0 upper50=high_prediction lower5=low_prediction
| rename high_prediction(count_prediction) as high_prediction
| eval deviation=count-round(count_prediction,0)
| streamstats window=300 current=true median(deviation) as median_of_residual
| eval abs_dev=(abs(deviation - median_of_residual))
| streamstats window=300 current=true median(abs_dev) as median_abs_dev
| eval upper_bound=if(median_of_residual + median_abs_dev * 5 < 0,abs(median_of_residual + median_abs_dev), median_of_residual + median_abs_dev * 5) 
| eval anomaly=if(deviation > upper_bound,1,0)
| predict deviation as deviation_prediction period=7 algorithm=LLP5 future_timespan=0 holdback=0 upper20=high_prediction lower20=low_prediction
| fields -  median_of_residual, median_abs_dev, abs_dev, high_prediction, bounds, count, count_prediction

woodcock · ‎08-09-2018

I agree with @DalJeanis. In particular, if this is the only search like this, report acceleration is the easiest and best option for you. If you could use MLTK, you could do a one-time learning over a huge time span and true this up periodically, but that's out. Also, check out this INCREDIBLE answer by @mmodestino here:

https://answers.splunk.com/answers/511894/how-to-use-the-timewrap-command-and-set-an-alert-f.html

View solution in original post

woodcock · ‎08-09-2018

I agree with @DalJeanis. In particular, if this is the only search like this, report acceleration is the easiest and best option for you. If you could use MLTK, you could do a one-time learning over a huge time span and true this up periodically, but that's out. Also, check out this INCREDIBLE answer by @mmodestino here:

https://answers.splunk.com/answers/511894/how-to-use-the-timewrap-command-and-set-an-alert-f.html

kiamco · ‎08-09-2018

@mmodestino explained it so well Thankss!!!

mattymo · ‎08-09-2018

this is even better

https://www.splunk.com/blog/2018/01/19/cyclical-statistical-forecasts-and-anomalies-part-1.html
https://www.splunk.com/blog/2018/02/05/cyclical-statistical-forecasts-and-anomalies-part-2.html
https://www.splunk.com/blog/2018/03/20/cyclical-statistical-forecasts-and-anomalies-part-3.html

3 part blog series by much smarter folks than me 😉

- MattyMo

DalJeanis · ‎08-09-2018

This is a good use case for an accelerated report, accelerated data model or a summary index. If your report is going to be based on summarized 15m increments, then it makes more sense for the system to be calculating each 15m increment once, rather than going back two months to do so.

Start with accelerating the report, which should work for your use case.

ACCELERATED REPORT

https://docs.splunk.com/Documentation/Splunk/7.1.2/Report/Acceleratereports

ACCELERATED DATA MODEL

https://docs.splunk.com/Documentation/Splunk/latest/Knowledge/Acceleratedatamodels

SUMMARY INDEXING

https://docs.splunk.com/Documentation/Splunk/7.1.2/Knowledge/Usesummaryindexing
https://www.splunk.com/view/SP-CAAACZW

kiamco · ‎08-09-2018

i thought of using a summary index also but if run a summary index every 15m wouldn't it affect the accuracy of the predict. for example a query with predict that runs for 2 months would get a more accurate prediction compared to a 4 hours prediction, or am I misunderstanding the predict command. I am not sure however hoe the accelerated report works. I have read the documentation but I don't really know how that would solve my issue.

DalJeanis · ‎08-09-2018

@kiamco - The summary index would contain the pre-summarized data. The predict could then run quickly across any length of time, and would not have to analyze the data at the event level ever again, which is what takes the majority of the CPU time.

Is there a way to specify a different time range for predict?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits

Join the Conversation

Is there a way to specify a different time range for predict?

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Splunk MCP & Agentic AI: Machine Data Without Limits