All Apps and Add-ons

How to integrate the results of multiple forecast time series to forecast another time series?

Engager

Hello!

I'm really new to Splunk's Machine Learning Toolkit, so any help would be greatly appreciated. Thank you.

I'm trying to forecast time series for multiple apps in my query. My current query is:

index=... report=1minrollup apps="..." earliest="06/07/2017:10:00:00" latest="06/07/2017:11:00:00"
| stats sum(COUNT) as sum
count by time,apps | stats avg(sumcount) as avgCount by time, apps
| bin _time span=5m
| eval time=
time%3600
| join origsourcetype time
[ search index=... report=1min
rollup apps="..." earliest="06/07/2017:11:00:00" latest="06/07/2017:12:00:00"
| stats sum(refCOUNT), as sumrefcount by time, apps
| bin _time span=5m
| stats avg(sum
refcount ) as avgrefCount,
stdev(sum
refcount ) as stdrefCount by _time, apps
| eval time=
time%3600]
| eval State=case((avgCount <=(avgrefCount+stdrefCount )),"Green",
true(),"Red")
| stats values(apps) by _time, State
| outputlookup eg.csv

This gives me the lookup table eg.csv which looks like:

 _time | State | values(apps)
hh:mm:ss| Green | app1 app2 app10
...

Now, I want to forecast the state of the apps on this time series. But since the state is calculated based on the range in which the avgCount falls, I feel instead of just forecasting the state, we must forecast the avgCount, avgrefCount, and stdrefCount and then calculate the state. Do you think this is the way forward? If so, how do I intertwine these forecast timeseries to calculate the state at any given time.

Thank you! Your help is greatly appreciated!

0 Karma

SplunkTrust
SplunkTrust

Okay, lots of interesting things in your code. I believe they are left over from earlier versions.

There will only ever be one record in each combination of _time and apps in the second stats command. Did you want that bin command before the second stats?

Never mind, this should do the trick... try this ...

index=... report=1min_rollup apps="..." earliest="06/07/2017:10:00:00" latest="06/07/2017:12:00:00" 
| stats sum(COUNT) as sum_count, sum(refCOUNT) as sum_ref_count  by _time,apps 
| bin _time span=5m 
| stats avg(sum_count) as avgCount, avg(sum_ref_count ) as avgrefCount, 
        stdev(sum_ref_count ) as stdrefCount by _time, apps 

| eval time=_time%3600
| stats latest(_time) as _time, latest(avgCount) as avgCount, 
        earliest(avgrefCount) as avgrefCount, earliest(stdrefCount) as stdrefCount 
        by time, apps 

| eval State=case((avgCount <=(avgrefCount+stdrefCount )),"Green", true(),"Red") 
| stats values(apps) by _time, State 
| outputlookup eg.csv

Assumptions - (1) the earlier hour is the reference, rather than the later hour as in your code. (2) there are no other search differences in the stuff you left out. (3) COUNT and refCOUNT are both actual fields, rather than refCOUNT being a rename that you didn't show us.

On the other hand, if there is no such field, you can do something like this..

index=... report=1min_rollup apps="..." earliest="06/07/2017:10:00:00" latest="06/07/2017:12:00:00" 
| addinfo
| eval info_mid_time = (info_max_time + info_min_time)/2
| eval refCOUNT=if(_time>=info_mid_time,COUNT,null())
| eval COUNT=if(_time>=info_mid_time,null(),COUNT)
| stats sum(COUNT) as sum_count, sum(refCOUNT) as sum_ref_count  by _time,apps 
| bin _time span=5m 

| stats avg(sum_count) as avgCount, avg(sum_ref_count ) as avgrefCount, 
        stdev(sum_ref_count ) as stdrefCount by _time, apps 

| eval time=_time%3600
| stats latest(_time) as _time, latest(avgCount) as avgCount, 
        earliest(avgrefCount) as avgrefCount, earliest(stdrefCount) as stdrefCount 
        by time, apps 

| eval State=case((avgCount <=(avgrefCount+stdrefCount )),"Green", true(),"Red") 
| stats values(apps) by _time, State 
| outputlookup eg.csv

If you want your reference to be multiple hours, then you can just change the earlier bound of the search, and instead of calculating info_mid_time, just use info_max_time -3600.

0 Karma