Solved: predict command and non realtime alerts

stephenmoorhous · ‎05-12-2017

hi - I have a query to predict traffic and highlight when the actual traffic goes over or below the prediction

index=uk sourcetype=pxy.access | timechart count(_raw) as ordercount | predict ordercount | rename upper95(prediction(ordercount)) as ceiling | rename lower95(prediction(ordercount)) as floor | eval excession=if(ordercount > ceiling, "1", "0") | eval recession=if(ordercount < floor, "-1", "0") | table _time,excession,recession,ordercount,ceiling,floor

Where excession or receession will be 1 or -1 if the real value is outside the predicted value

The issue is I do not want to run it as a real time search (to save on overheads), but if i run as a saved search over say 24 hours (To get enough data for accurate predictions) and run it every 5 or 10 minutes so it is up to date, then the will be several times during the 24 hours where there is an anomaly - but I only want to report or alert on the last 5 minutes - ie since the last run

is this possible?

DalJeanis · ‎05-12-2017

Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...

| tstats count as ordercount 
    WHERE index=uk AND sourcetype=pxy.access 
     AND _time>=(now()-86700) AND _time<=(now()-300)
    BY_time span=5m

Add to the beginning...

earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m

...and add to the end...

| where _time >= now()-661

... for five minutes, or -961 for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.

Personally, I would probably run this every five minutes starting at 3 minutes after the hour:

earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access 
| bin _time span=5m
| timechart count(_raw) as ordercount 
| predict ordercount 
| rename upper95(prediction(ordercount)) as ceiling 
| rename lower95(prediction(ordercount)) as floor 
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0") 
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")

...well, maybe not actually phrased that way, in production...

Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.

Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.

View solution in original post

DalJeanis · ‎05-12-2017

Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...

| tstats count as ordercount 
    WHERE index=uk AND sourcetype=pxy.access 
     AND _time>=(now()-86700) AND _time<=(now()-300)
    BY_time span=5m

Add to the beginning...

earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m

...and add to the end...

| where _time >= now()-661

... for five minutes, or -961 for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.

Personally, I would probably run this every five minutes starting at 3 minutes after the hour:

earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access 
| bin _time span=5m
| timechart count(_raw) as ordercount 
| predict ordercount 
| rename upper95(prediction(ordercount)) as ceiling 
| rename lower95(prediction(ordercount)) as floor 
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0") 
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")

...well, maybe not actually phrased that way, in production...

Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.

Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.

stephenmoorhous · ‎05-12-2017

cool - thanks

DalJeanis · ‎05-12-2017

@stephenmoorhouse - see the update. tstats is the way to go, if it works.

niketn · ‎05-12-2017

Have you checked out Forecast Internet Traffic example in the Machine Learning Toolkit App? This will also allow you to have Alerting capabilities. Splunkbase has video tutorials and link to the documentation https://splunkbase.splunk.com/app/2890/.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"

stephenmoorhous · ‎05-12-2017

I would love to - and indeed I will have a play with it on my local machine - unfortunately in my company I dont have that kind of access rights and its managed by a team in another country etc.

predict command and non realtime alerts

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...

Join the Conversation

predict command and non realtime alerts

[Puzzles] Solve, Learn, Repeat: Dynamic formatting from XML events

Enter the Agentic Era with Splunk AI Assistant for SPL 1.4

Stronger Security with Federated Search for S3, GCP SQL & Australian Threat ...