Splunk IT Service Intelligence
Highlighted

How to forecast for multiple hosts individually

New Member

Hello all,

I was trying to get some predictive alerts working, my only problem is the search I've written is limited to a single host, and I'm trying to manage 2300 servers.
This part of the code effectively identifies outliers in CPU usage.

index=perfmon host=<server> counter="% Processor Time" 
| timechart span=5min avg(Value) 
| predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
| `forecastviz(4, 2, "avg(Value)", 95)` 
| eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)

The following isolates the search to the last 30 minutes

| eval eTime=relative_time(_time, "-0M") | eval lTime=relative_time(now(), "-30M") | where eTime>=lTime 

My plan is to schedule this search to run every 30 minutes, and to alert/email when 'isOutlier=1'. This works great if I only have 1 server, or if I want to group them all together as a single object. But does anyone know of a way to apply this with a wildcard, and have it evaluate each host independently of the others?

0 Karma
Highlighted

Re: How to forecast for multiple hosts individually

Explorer

Try this:

 index=perfmon host=* counter="% Processor Time" 
 | timechart span=5min avg(Value) 
 | predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
 | `forecastviz(4, 2, "avg(Value)", 95)` 
 | eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)
 | table host, isOutlier | search isOutlier=1

For the trigger conditions, set trigger alert when number of results are greater than 0 and trigger for each result with limited throttling set if you don't want to receive multiple email alerts

0 Karma
Highlighted

Re: How to forecast for multiple hosts individually

New Member

That doesn't seem to do the trick either.

I ran this search on 3 host's individually(server1,server2,server3). Then ran it with server* as I originally had it (which averages the data from the 3 servers), and with your modification (line 6).

The host field does return with your modification, but it's a null value. And it only detects the outliers as reflected from the combined average. (I verified this by including: table host, isOutlier, avg(value)) The CPU usage matched the average of the 3 rather than any 1 server's CPU at the time of an outlier.

0 Karma
Speak Up for Splunk Careers!

We want to better understand the impact Splunk experience and expertise has has on individuals' careers, and help highlight the growing demand for Splunk skills.