Splunk IT Service Intelligence

How to forecast for multiple hosts individually

New Member

Hello all,

I was trying to get some predictive alerts working, my only problem is the search I've written is limited to a single host, and I'm trying to manage 2300 servers.
This part of the code effectively identifies outliers in CPU usage.

index=perfmon host=<server> counter="% Processor Time" 
| timechart span=5min avg(Value) 
| predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
| `forecastviz(4, 2, "avg(Value)", 95)` 
| eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)

The following isolates the search to the last 30 minutes

| eval eTime=relative_time(_time, "-0M") | eval lTime=relative_time(now(), "-30M") | where eTime>=lTime 

My plan is to schedule this search to run every 30 minutes, and to alert/email when 'isOutlier=1'. This works great if I only have 1 server, or if I want to group them all together as a single object. But does anyone know of a way to apply this with a wildcard, and have it evaluate each host independently of the others?

0 Karma

Explorer

Try this:

 index=perfmon host=* counter="% Processor Time" 
 | timechart span=5min avg(Value) 
 | predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
 | `forecastviz(4, 2, "avg(Value)", 95)` 
 | eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)
 | table host, isOutlier | search isOutlier=1

For the trigger conditions, set trigger alert when number of results are greater than 0 and trigger for each result with limited throttling set if you don't want to receive multiple email alerts

0 Karma

New Member

That doesn't seem to do the trick either.

I ran this search on 3 host's individually(server1,server2,server3). Then ran it with server* as I originally had it (which averages the data from the 3 servers), and with your modification (line 6).

The host field does return with your modification, but it's a null value. And it only detects the outliers as reflected from the combined average. (I verified this by including: table host, isOutlier, avg(value)) The CPU usage matched the average of the 3 rather than any 1 server's CPU at the time of an outlier.

0 Karma