hi - I have a query to predict traffic and highlight when the actual traffic goes over or below the prediction
index=uk sourcetype=pxy.access | timechart count(_raw) as ordercount | predict ordercount | rename upper95(prediction(ordercount)) as ceiling | rename lower95(prediction(ordercount)) as floor | eval excession=if(ordercount > ceiling, "1", "0") | eval recession=if(ordercount < floor, "-1", "0") | table _time,excession,recession,ordercount,ceiling,floor
Where excession or receession will be 1 or -1 if the real value is outside the predicted value
The issue is I do not want to run it as a real time search (to save on overheads), but if i run as a saved search over say 24 hours (To get enough data for accurate predictions) and run it every 5 or 10 minutes so it is up to date, then the will be several times during the 24 hours where there is an anomaly - but I only want to report or alert on the last 5 minutes - ie since the last run
is this possible?
Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...
| tstats count as ordercount
WHERE index=uk AND sourcetype=pxy.access
AND _time>=(now()-86700) AND _time<=(now()-300)
BY_time span=5m
Add to the beginning...
earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m
...and add to the end...
| where _time >= now()-661
... for five minutes, or -961
for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.
Personally, I would probably run this every five minutes starting at 3 minutes after the hour:
earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access
| bin _time span=5m
| timechart count(_raw) as ordercount
| predict ordercount
| rename upper95(prediction(ordercount)) as ceiling
| rename lower95(prediction(ordercount)) as floor
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0")
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")
...well, maybe not actually phrased that way, in production...
Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.
Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.
Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...
| tstats count as ordercount
WHERE index=uk AND sourcetype=pxy.access
AND _time>=(now()-86700) AND _time<=(now()-300)
BY_time span=5m
Add to the beginning...
earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m
...and add to the end...
| where _time >= now()-661
... for five minutes, or -961
for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.
Personally, I would probably run this every five minutes starting at 3 minutes after the hour:
earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access
| bin _time span=5m
| timechart count(_raw) as ordercount
| predict ordercount
| rename upper95(prediction(ordercount)) as ceiling
| rename lower95(prediction(ordercount)) as floor
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0")
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")
...well, maybe not actually phrased that way, in production...
Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.
Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.
cool - thanks
@stephenmoorhouse - see the update. tstats
is the way to go, if it works.
Have you checked out Forecast Internet Traffic example in the Machine Learning Toolkit App? This will also allow you to have Alerting capabilities. Splunkbase has video tutorials and link to the documentation https://splunkbase.splunk.com/app/2890/.
I would love to - and indeed I will have a play with it on my local machine - unfortunately in my company I dont have that kind of access rights and its managed by a team in another country etc.