Splunk Search

predict command and non realtime alerts

Path Finder

hi - I have a query to predict traffic and highlight when the actual traffic goes over or below the prediction

index=uk sourcetype=pxy.access | timechart count(_raw) as ordercount | predict ordercount | rename upper95(prediction(ordercount)) as ceiling | rename lower95(prediction(ordercount)) as floor | eval excession=if(ordercount > ceiling, "1", "0") | eval recession=if(ordercount < floor, "-1", "0") | table _time,excession,recession,ordercount,ceiling,floor

Where excession or receession will be 1 or -1 if the real value is outside the predicted value

The issue is I do not want to run it as a real time search (to save on overheads), but if i run as a saved search over say 24 hours (To get enough data for accurate predictions) and run it every 5 or 10 minutes so it is up to date, then the will be several times during the 24 hours where there is an anomaly - but I only want to report or alert on the last 5 minutes - ie since the last run

is this possible?

Tags (1)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...

| tstats count as ordercount 
    WHERE index=uk AND sourcetype=pxy.access 
     AND _time>=(now()-86700) AND _time<=(now()-300)
    BY_time span=5m

Add to the beginning...

earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m

...and add to the end...

| where _time >= now()-661

... for five minutes, or -961 for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.

Personally, I would probably run this every five minutes starting at 3 minutes after the hour:

earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access 
| bin _time span=5m
| timechart count(_raw) as ordercount 
| predict ordercount 
| rename upper95(prediction(ordercount)) as ceiling 
| rename lower95(prediction(ordercount)) as floor 
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0") 
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")

...well, maybe not actually phrased that way, in production...


Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.

Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.

View solution in original post

0 Karma

SplunkTrust
SplunkTrust

Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...

| tstats count as ordercount 
    WHERE index=uk AND sourcetype=pxy.access 
     AND _time>=(now()-86700) AND _time<=(now()-300)
    BY_time span=5m

Add to the beginning...

earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m

...and add to the end...

| where _time >= now()-661

... for five minutes, or -961 for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.

Personally, I would probably run this every five minutes starting at 3 minutes after the hour:

earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access 
| bin _time span=5m
| timechart count(_raw) as ordercount 
| predict ordercount 
| rename upper95(prediction(ordercount)) as ceiling 
| rename lower95(prediction(ordercount)) as floor 
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0") 
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")

...well, maybe not actually phrased that way, in production...


Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.

Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.

View solution in original post

0 Karma

Path Finder

cool - thanks

SplunkTrust
SplunkTrust

@stephenmoorhouse - see the update. tstats is the way to go, if it works.

0 Karma

SplunkTrust
SplunkTrust

Have you checked out Forecast Internet Traffic example in the Machine Learning Toolkit App? This will also allow you to have Alerting capabilities. Splunkbase has video tutorials and link to the documentation https://splunkbase.splunk.com/app/2890/.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

Path Finder

I would love to - and indeed I will have a play with it on my local machine - unfortunately in my company I dont have that kind of access rights and its managed by a team in another country etc.

0 Karma
State of Splunk Careers

Access the Splunk Careers Report to see real data that shows how Splunk mastery increases your value and job satisfaction.

Find out what your skills are worth!