I'd like to plot a numerical value over time (such as syslog event counts), along with a linear trendline (such as this: http://www.splunk.com/wiki/Community:Plotting_a_linear_trendline), but how can I extend the trendline into the future in order to predict approximately what the numerical value would be n days from now?
In order to do this we can start with 2 macros:
# macros.conf [lineartrend(2)] args = x, y description = Develop a linear trendline against a data set definition = eventstats count as numevents sum($x$) as sumX sum($y$) as sumY sum(eval($x$*$y$)) as sumXY sum(eval($x$*$x$)) as sumX2 sum(eval($y$*$y$)) as sumY2 | eval slope=((numevents*sumXY)-(sumX*sumY))/((numevents*sumX2)-(sumX*sumX)) | eval yintercept=(sumY-(slope*sumX))/numevents | eval newY=(yintercept + (slope*$x$) - 5) | eval R=((numevents*sumXY) - (sumX*sumY))/sqrt(((numevents*sumX2)-(sumX*sumX))*((numevents*sumY2)-(sumY*sumY))) | eval R2=R*R [extendtrend(2)] args = newY, end description = For use after something like lineartrend(2). Extend the trendline into the future. definition = append [gentimes start=1 end=$end$ | rename starttime as _time | fields _time] | delta $newY$ as newY_delta | eventstats avg(newY_delta) as avg_newY_delta last($newY$) as lastY | eval pred_accum=if(isnull($newY$), avg_newY_delta, 0) | accum pred_accum | eval newY=if(isnull($newY$), pred_accum + lastY, $newY$)
For the lineartrend(2) macro, you pass in the time field (x value) as well as the numerical value (y value) of the data which is being plotted (in this case the count of events). This will create a new numerical field called newY which will be the y values for the trendline. However, the new trendline will only be plotted until "now". In order to extend it into the future we'll use the extendtrend(2) macro. For this macro, you pass the newY field (y value to plot) as well as the number of days to project into the future. The x value (timeline) will be created using gentimes.
The final search will look like this and includes a high and low threshold in order to show the intersection of the trend:
sourcetype=syslog | timechart count | `lineartrend(_time, count)` | `extendtrend(newY, 7)` | eval low_threshold=5000 | eval high_threshold=50000 | timechart values(count) as events values(newY) as linear_trend values(low_threshold) as low_threshold values(high_threshold) as high_threshold
The result should look something like this where the blue line is the plotted syslog event counts ending today, the yellow line is the trendline, which extends beyond today, and the high and low threshold lines:
If I were to increase the linear trend Line to a future date, like another month, basically what I want is to have the flexibility to choose a specified future date for the trend Line
I'm not sure why this isn't working for me, I'm assuming it's because the search query provided above isn't meant to be used verbatim. I'm just not sure what to replace with what. This is the error I'm getting:
Error in 'eventstats' command: You must specify a rename for the aggregation specifier on the dynamically evaluated field 'sum(eval(count*count))'.
I have a field called 'action' and I'm trying to trend a certain action to see where it will be in the next month.
Greetings, I just wanted to mention that there is one small error based on a traditional linear regression. Specifically the line: | eval newY=(yintercept + (slope*$x$) - 5) , the "-5" doesn't do much, and if anything, can cause problems if you are using a small data set that hasn't changed. For example, if you have a series of 4's, it will show newY as -1.
Thanks for this information it help me quite a bit, however I noticed that the extendtrend macro provided doesn't work properly unless you are searching with a bucket span of 1 day.
I used the following macro which gave me a more accurate picture.
args = x, y, end
description = Extends lineartrend(2), x and y should match the args provided to lineartrend, end should be the number of days into the future you would like to extend the trend.
definition = append [ gentimes start=1 end=$end$ |rename starttime as _time | fields _time] |eventstats values(yintercept) as yintercept, values(slope) as slope, values(numevents) as numevents | eval newY=(yintercept + (slope * _time)) | fields _time, $y$, newY
You know what worked for me? Using it with the predict command. The predict command will fill in your data set into the future. After that, you can easily use the trendline command on the field that contains the newly added forecast from the predict command.