Splunk Search

predict command and non realtime alerts

stephenmoorhous
Path Finder

hi - I have a query to predict traffic and highlight when the actual traffic goes over or below the prediction

index=uk sourcetype=pxy.access | timechart count(_raw) as ordercount | predict ordercount | rename upper95(prediction(ordercount)) as ceiling | rename lower95(prediction(ordercount)) as floor | eval excession=if(ordercount > ceiling, "1", "0") | eval recession=if(ordercount < floor, "-1", "0") | table _time,excession,recession,ordercount,ceiling,floor

Where excession or receession will be 1 or -1 if the real value is outside the predicted value

The issue is I do not want to run it as a real time search (to save on overheads), but if i run as a saved search over say 24 hours (To get enough data for accurate predictions) and run it every 5 or 10 minutes so it is up to date, then the will be several times during the 24 hours where there is an anomaly - but I only want to report or alert on the last 5 minutes - ie since the last run

is this possible?

Tags (1)
0 Karma
1 Solution

DalJeanis
Legend

Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...

| tstats count as ordercount 
    WHERE index=uk AND sourcetype=pxy.access 
     AND _time>=(now()-86700) AND _time<=(now()-300)
    BY_time span=5m

Add to the beginning...

earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m

...and add to the end...

| where _time >= now()-661

... for five minutes, or -961 for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.

Personally, I would probably run this every five minutes starting at 3 minutes after the hour:

earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access 
| bin _time span=5m
| timechart count(_raw) as ordercount 
| predict ordercount 
| rename upper95(prediction(ordercount)) as ceiling 
| rename lower95(prediction(ordercount)) as floor 
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0") 
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")

...well, maybe not actually phrased that way, in production...


Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.

Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.

View solution in original post

0 Karma

DalJeanis
Legend

Update - you should be able to replace the search with tstats, which is vastly less resource-intensive, since it should be looking at pre-summarized data. Try using this, instead of the first 2 lines in my code below, and let me know what happens...

| tstats count as ordercount 
    WHERE index=uk AND sourcetype=pxy.access 
     AND _time>=(now()-86700) AND _time<=(now()-300)
    BY_time span=5m

Add to the beginning...

earliest=-1d@m-5m latest=@m-5m index=uk sourcetype=pxy.access | bin _time span=5m

...and add to the end...

| where _time >= now()-661

... for five minutes, or -961 for ten minutes. Because the data is clumped into 5 minute increments, and you are running every 5 minutes, when you alert, you are alerting on a 5-minute period that started ten minutes ago and ended 5 minutes ago.

Personally, I would probably run this every five minutes starting at 3 minutes after the hour:

earliest=-1d@m-3m latest=@m-3m index=uk sourcetype=pxy.access 
| bin _time span=5m
| timechart count(_raw) as ordercount 
| predict ordercount 
| rename upper95(prediction(ordercount)) as ceiling 
| rename lower95(prediction(ordercount)) as floor 
| eval cession=case(ordercount > ceiling, "1", ordercount < floor, "-1", true(), "0") 
| table _time,cession,ordercount,ceiling,floor
| eventstats max(_time) as maxtime
| where _time = maxtime AND (cession!=0)
| eval message =if(cession=1,"High Traffic, REJOICE!!!","Low Traffic, PANIC!!!")

...well, maybe not actually phrased that way, in production...


Now, of course, if you have a LOT of traffic, it might be best to just summarize five minutes' traffic into a summary index every five minutes, and then you can run the predict against the summary index instead of the raw data.

Crud. I bet that you already can do that with tstats without any extra process to re-summarize it.

0 Karma

stephenmoorhous
Path Finder

cool - thanks

DalJeanis
Legend

@stephenmoorhouse - see the update. tstats is the way to go, if it works.

0 Karma

niketn
Legend

Have you checked out Forecast Internet Traffic example in the Machine Learning Toolkit App? This will also allow you to have Alerting capabilities. Splunkbase has video tutorials and link to the documentation https://splunkbase.splunk.com/app/2890/.

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

stephenmoorhous
Path Finder

I would love to - and indeed I will have a play with it on my local machine - unfortunately in my company I dont have that kind of access rights and its managed by a team in another country etc.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...