Splunk ITSI

How to forecast for multiple hosts individually

clowdmike
New Member

Hello all,

I was trying to get some predictive alerts working, my only problem is the search I've written is limited to a single host, and I'm trying to manage 2300 servers.
This part of the code effectively identifies outliers in CPU usage.

index=perfmon host=<server> counter="% Processor Time" 
| timechart span=5min avg(Value) 
| predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
| `forecastviz(4, 2, "avg(Value)", 95)` 
| eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)

The following isolates the search to the last 30 minutes

| eval eTime=relative_time(_time, "-0M") | eval lTime=relative_time(now(), "-30M") | where eTime>=lTime 

My plan is to schedule this search to run every 30 minutes, and to alert/email when 'isOutlier=1'. This works great if I only have 1 server, or if I want to group them all together as a single object. But does anyone know of a way to apply this with a wildcard, and have it evaluate each host independently of the others?

0 Karma

paranjith
Explorer

Try this:

 index=perfmon host=* counter="% Processor Time" 
 | timechart span=5min avg(Value) 
 | predict "avg(Value)" as prediction algorithm=LLP holdback=2 future_timespan=2 period=288 upper95=upper95 lower95=lower95 
 | `forecastviz(4, 2, "avg(Value)", 95)` 
 | eval isOutlier=if('avg(Value)' > 'upper95(prediction)', 1, 0)
 | table host, isOutlier | search isOutlier=1

For the trigger conditions, set trigger alert when number of results are greater than 0 and trigger for each result with limited throttling set if you don't want to receive multiple email alerts

0 Karma

clowdmike
New Member

That doesn't seem to do the trick either.

I ran this search on 3 host's individually(server1,server2,server3). Then ran it with server* as I originally had it (which averages the data from the 3 servers), and with your modification (line 6).

The host field does return with your modification, but it's a null value. And it only detects the outliers as reflected from the combined average. (I verified this by including: table host, isOutlier, avg(value)) The CPU usage matched the average of the 3 rather than any 1 server's CPU at the time of an outlier.

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.
Get Updates on the Splunk Community!

Data Persistence in the OpenTelemetry Collector

This blog post is part of an ongoing series on OpenTelemetry. What happens if the OpenTelemetry collector ...

Introducing Splunk 10.0: Smarter, Faster, and More Powerful Than Ever

Now On Demand Whether you're managing complex deployments or looking to future-proof your data ...

Community Content Calendar, September edition

Welcome to another insightful post from our Community Content Calendar! We're thrilled to continue bringing ...