Dashboards & Visualizations

Plotting trendlines into the future

mw
Splunk Employee
Splunk Employee

I'd like to plot a numerical value over time (such as syslog event counts), along with a linear trendline (such as this: http://www.splunk.com/wiki/Community:Plotting_a_linear_trendline), but how can I extend the trendline into the future in order to predict approximately what the numerical value would be n days from now?

Tags (1)
1 Solution

mw
Splunk Employee
Splunk Employee

In order to do this we can start with 2 macros:

# macros.conf
[lineartrend(2)]
args = x, y
description = Develop a linear trendline against a data set
definition = eventstats count as numevents sum($x$) as sumX sum($y$) as sumY sum(eval($x$*$y$)) as sumXY sum(eval($x$*$x$)) as sumX2 sum(eval($y$*$y$)) as sumY2 
  | eval slope=((numevents*sumXY)-(sumX*sumY))/((numevents*sumX2)-(sumX*sumX)) 
  | eval yintercept=(sumY-(slope*sumX))/numevents 
  | eval newY=(yintercept + (slope*$x$) - 5) 
  | eval R=((numevents*sumXY) - (sumX*sumY))/sqrt(((numevents*sumX2)-(sumX*sumX))*((numevents*sumY2)-(sumY*sumY))) 
  | eval R2=R*R

[extendtrend(2)]
args = newY, end
description = For use after something like lineartrend(2).  Extend the trendline into the future.
definition = append [gentimes start=1 end=$end$ | rename starttime as _time | fields _time] 
  | delta $newY$ as newY_delta 
  | eventstats avg(newY_delta) as avg_newY_delta last($newY$) as lastY 
  | eval pred_accum=if(isnull($newY$), avg_newY_delta, 0) 
  | accum pred_accum 
  | eval newY=if(isnull($newY$), pred_accum + lastY, $newY$)

For the lineartrend(2) macro, you pass in the time field (x value) as well as the numerical value (y value) of the data which is being plotted (in this case the count of events). This will create a new numerical field called newY which will be the y values for the trendline. However, the new trendline will only be plotted until "now". In order to extend it into the future we'll use the extendtrend(2) macro. For this macro, you pass the newY field (y value to plot) as well as the number of days to project into the future. The x value (timeline) will be created using gentimes.

The final search will look like this and includes a high and low threshold in order to show the intersection of the trend:

sourcetype=syslog | timechart count 
  | `lineartrend(_time, count)` 
  | `extendtrend(newY, 7)` 
  | eval low_threshold=5000 
  | eval high_threshold=50000 
  | timechart values(count) as events values(newY) as linear_trend values(low_threshold) as low_threshold values(high_threshold) as high_threshold

The result should look something like this where the blue line is the plotted syslog event counts ending today, the yellow line is the trendline, which extends beyond today, and the high and low threshold lines:

http://i.imgur.com/krn29.png

View solution in original post

whopper
Explorer

You know what worked for me? Using it with the predict command. The predict command will fill in your data set into the future. After that, you can easily use the trendline command on the field that contains the newly added forecast from the predict command.

cttorres
Explorer

Hi!
Can you give us an example please?

Thanks!

0 Karma

cpayne_satisnet
New Member

Thanks for this information it help me quite a bit, however I noticed that the extendtrend macro provided doesn't work properly unless you are searching with a bucket span of 1 day.

I used the following macro which gave me a more accurate picture.

[lineartrendextend(3)]
args = x, y, end
description = Extends lineartrend(2), x and y should match the args provided to lineartrend, end should be the number of days into the future you would like to extend the trend.
definition = append [ gentimes start=1 end=$end$ |rename starttime as _time | fields _time] |eventstats values(yintercept) as yintercept, values(slope) as slope, values(numevents) as numevents | eval newY=(yintercept + (slope * _time)) | fields _time, $y$, newY

0 Karma

mw
Splunk Employee
Splunk Employee

In order to do this we can start with 2 macros:

# macros.conf
[lineartrend(2)]
args = x, y
description = Develop a linear trendline against a data set
definition = eventstats count as numevents sum($x$) as sumX sum($y$) as sumY sum(eval($x$*$y$)) as sumXY sum(eval($x$*$x$)) as sumX2 sum(eval($y$*$y$)) as sumY2 
  | eval slope=((numevents*sumXY)-(sumX*sumY))/((numevents*sumX2)-(sumX*sumX)) 
  | eval yintercept=(sumY-(slope*sumX))/numevents 
  | eval newY=(yintercept + (slope*$x$) - 5) 
  | eval R=((numevents*sumXY) - (sumX*sumY))/sqrt(((numevents*sumX2)-(sumX*sumX))*((numevents*sumY2)-(sumY*sumY))) 
  | eval R2=R*R

[extendtrend(2)]
args = newY, end
description = For use after something like lineartrend(2).  Extend the trendline into the future.
definition = append [gentimes start=1 end=$end$ | rename starttime as _time | fields _time] 
  | delta $newY$ as newY_delta 
  | eventstats avg(newY_delta) as avg_newY_delta last($newY$) as lastY 
  | eval pred_accum=if(isnull($newY$), avg_newY_delta, 0) 
  | accum pred_accum 
  | eval newY=if(isnull($newY$), pred_accum + lastY, $newY$)

For the lineartrend(2) macro, you pass in the time field (x value) as well as the numerical value (y value) of the data which is being plotted (in this case the count of events). This will create a new numerical field called newY which will be the y values for the trendline. However, the new trendline will only be plotted until "now". In order to extend it into the future we'll use the extendtrend(2) macro. For this macro, you pass the newY field (y value to plot) as well as the number of days to project into the future. The x value (timeline) will be created using gentimes.

The final search will look like this and includes a high and low threshold in order to show the intersection of the trend:

sourcetype=syslog | timechart count 
  | `lineartrend(_time, count)` 
  | `extendtrend(newY, 7)` 
  | eval low_threshold=5000 
  | eval high_threshold=50000 
  | timechart values(count) as events values(newY) as linear_trend values(low_threshold) as low_threshold values(high_threshold) as high_threshold

The result should look something like this where the blue line is the plotted syslog event counts ending today, the yellow line is the trendline, which extends beyond today, and the high and low threshold lines:

http://i.imgur.com/krn29.png

msarro
Builder

Greetings, I just wanted to mention that there is one small error based on a traditional linear regression. Specifically the line: | eval newY=(yintercept + (slope*$x$) - 5) , the "-5" doesn't do much, and if anything, can cause problems if you are using a small data set that hasn't changed. For example, if you have a series of 4's, it will show newY as -1.

0 Karma

rps462
Path Finder

I'm not sure why this isn't working for me, I'm assuming it's because the search query provided above isn't meant to be used verbatim. I'm just not sure what to replace with what. This is the error I'm getting:

Error in 'eventstats' command: You must specify a rename for the aggregation specifier on the dynamically evaluated field 'sum(eval(count*count))'.
Linear

I have a field called 'action' and I'm trying to trend a certain action to see where it will be in the next month.

0 Karma

Dark_Ichigo
Builder

If I were to increase the linear trend Line to a future date, like another month, basically what I want is to have the flexibility to choose a specified future date for the trend Line

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...