Solved: duplicate in dates for stats when using predict

mjm295 · ‎08-15-2017

I have this query to predict CPU usage, looking at real data for last 90 days and predicting ahead 60 days.

index="linux_capacity"  source=cpu CPU=all  host=ip-10-134*  | eval PctUsed = 100 - pctIdle 
| timechart avg(PctUsed) as PercentUsed 
| predict "PercentUsed" as futures algorithm=LLP future_timespan=60
| eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )
| eval lower95(futures)=if(_time<=now(), Null, 'lower95(futures)' )

Looking at the stats (results) the 10 days from today backwards get duplicated. Today is 16th August. Here is the snip of the stats:

2017-08-02  10.345810   11.2606080643        
2017-08-03  8.371493    11.6832498048        
2017-08-04  8.287087    10.2299365809        
2017-08-05  12.312134   12.2315872649        
2017-08-06  11.367797   10.9899627817        
2017-08-07  17.745977   14.2295366964        
2017-08-08  10.109057   10.1245616922        
2017-08-09  17.496496   14.0287175836        
2017-08-10  8.339878    11.2479039882        
2017-08-11  8.737030    10.0940590718        
2017-08-12  8.032037    9.39042740568        
2017-08-13  7.555324    9.33242169748        
2017-08-14  9.514418    11.8174795236        
2017-08-15  8.862755    8.98957755123        
2017-08-16  8.136355    11.4131114138        
2017-08-06              11.2479039882        
2017-08-07              10.0940590718        
2017-08-08              9.39042740568        
2017-08-09              9.33242169748        
2017-08-10              11.8174795236        
2017-08-11              8.98957755123        
2017-08-12              11.4131114138        
2017-08-13              11.2479039882        
2017-08-14              10.0940590718        
2017-08-15              9.39042740568        
2017-08-16              9.33242169748        
2017-08-17              11.8174795236   4.01416734251   19.6207917047
2017-08-18              8.98957755123   -0.453621346862 18.4327764493
2017-08-19              11.4131114138   1.74019160299   21.0860312246
2017-08-20              11.2479039882   0.874637979426  21.6211699969
2017-08-21              10.0940590718   4.39114905157   15.796969092
2017-08-22              9.39042740568   -4.25403965674  23.0348944681

So the real Data stops on 2017-08-17
BUT then the predicted data start again from 2017-08--6
Before the 95th percentiles kick on the 2nd time we cross 2017-08-17

What could be casuing this? It makes the graphe I am creating look messy.

Thanks
Mark

DalJeanis · ‎08-16-2017

No such issue on 6.4.7, by my test. Although, I have seen timechart add extra crud on the end sometimes.

Here's a workaround - add this to the end of the search.

| streamstats current=f max(_time) as priorbesttime
| where _time > priorbesttime
| fields - priorbesttime

Also, please note that THIS code is not doing what you think it is.

 | eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )

That code is equivalent to...

 | eval upper95(futures)=if(_time<=now(), SomeFieldNamedNullThatDoesntExistAndThereforeHappensToHaveANullValue, 'upper95(futures)' )

... as opposed to this, which specifies to return a null value.

 | eval upper95(futures)=if(_time<=now(), null(), 'upper95(futures)' )

View solution in original post

DalJeanis · ‎08-16-2017

No such issue on 6.4.7, by my test. Although, I have seen timechart add extra crud on the end sometimes.

Here's a workaround - add this to the end of the search.

| streamstats current=f max(_time) as priorbesttime
| where _time > priorbesttime
| fields - priorbesttime

Also, please note that THIS code is not doing what you think it is.

 | eval upper95(futures)=if(_time<=now(), Null, 'upper95(futures)' )

That code is equivalent to...

 | eval upper95(futures)=if(_time<=now(), SomeFieldNamedNullThatDoesntExistAndThereforeHappensToHaveANullValue, 'upper95(futures)' )

... as opposed to this, which specifies to return a null value.

 | eval upper95(futures)=if(_time<=now(), null(), 'upper95(futures)' )

mjm295 · ‎08-16-2017

Thanks for the "Null" clarification.

mjm295 · ‎08-16-2017

Thanks Dal, looking much tidier now. Just for completeness my final query is:

index="linux_capacity"  source=cpu CPU=all  host=ip-10-134*  
| eval PctUsed = 100 - pctIdle 
|  timechart avg(PctUsed) as PercentUsed span=1h
| eval PercentUsed=round(PercentUsed,2)
| predict "PercentUsed" as futures algorithm=LLP future_timespan=960 lower90=low upper90=high
| eval futures=round(futures,2) 
| eval high(futures)=if(_time<=now(), null(), 'high(futures)' ) 
| eval low(futures)=if(_time<=now(), null(), 'low(futures)' )
| eval low(futures)=if( 'low(futures)' < 0, 0, 'low(futures)' )
 | streamstats current=f max(_time) as priorbesttime
 | where _time > priorbesttime
 | fields - priorbesttime

DalJeanis · ‎08-17-2017

@mjm295 - Thanks for posting that. It can help other people when they can see the solution that worked.

cmerriman · ‎08-16-2017

what version of Splunk are you using? i just ran your query with some of my own data and it worked fine. I'm on 6.6.2

mjm295 · ‎08-16-2017

its 6.5.1 to be exact.

mjm295 · ‎08-16-2017

Version 6.5 here.

duplicate in dates for stats when using predict

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...

Join the Conversation

duplicate in dates for stats when using predict

Data Management Digest – December 2025

Index This | What is broken 80% of the time by February?

Unlock Faster Time-to-Value on Edge and Ingest Processor with New SPL2 Pipeline ...