Splunk IT Service Intelligence

How to forecast for multiple hosts individually

clowdmike
New Member

Hello all,

I was trying to get some predictive alerts working, my only problem is the search I've written is limited to a single host, and I'm trying to manage 2300 servers.
This part of the code effectively identifies outliers in CPU usage.

index=perfmon host=<server> counter="% Processor Time" 
| timechart span=5min avg(Value) 
| predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
| `forecastviz(4, 2, "avg(Value)", 95)` 
| eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)

The following isolates the search to the last 30 minutes

| eval eTime=relative_time(_time, "-0M") | eval lTime=relative_time(now(), "-30M") | where eTime>=lTime 

My plan is to schedule this search to run every 30 minutes, and to alert/email when 'isOutlier=1'. This works great if I only have 1 server, or if I want to group them all together as a single object. But does anyone know of a way to apply this with a wildcard, and have it evaluate each host independently of the others?

0 Karma

paranjith
Explorer

Try this:

 index=perfmon host=* counter="% Processor Time" 
 | timechart span=5min avg(Value) 
 | predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
 | `forecastviz(4, 2, "avg(Value)", 95)` 
 | eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)
 | table host, isOutlier | search isOutlier=1

For the trigger conditions, set trigger alert when number of results are greater than 0 and trigger for each result with limited throttling set if you don't want to receive multiple email alerts

0 Karma

clowdmike
New Member

That doesn't seem to do the trick either.

I ran this search on 3 host's individually(server1,server2,server3). Then ran it with server* as I originally had it (which averages the data from the 3 servers), and with your modification (line 6).

The host field does return with your modification, but it's a null value. And it only detects the outliers as reflected from the combined average. (I verified this by including: table host, isOutlier, avg(value)) The CPU usage matched the average of the 3 rather than any 1 server's CPU at the time of an outlier.

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...