Getting Data In

Create an alert based on CPU being at X% for a span of X minutes

rahulkumarfgf
Explorer

Hi Everyone!
I have researched this issue and found a few solutions, though not completely. I followed this link:
https://answers.splunk.com/answers/557838/create-an-alert-based-on-cpu-being-at-95-for-a-spa.html

and wanted to know if I can use "%_Processor_Time" instead of CPUPct as I am not able to extract "CPUPct" field.

Also, I followed this link: https://answers.splunk.com/answers/693250/how-do-i-alert-if-cpu-is-greater-than-97-for-more.html

here, I wanted to understand what does "instance=Total" mean?

Also, which one of the accepted answers is better to use? The queries I used are as follows:
SPL 1:

index="perfmoncpu" | bin _time span=1m

| stats max(%_Processor_Time) as PercentProcessorTime by host _time
| eval PercentProcessorTime = round(PercentProcessorTime, 2)
| eval overload = if(PercentProcessorTime >= 90, 1, 0)
|streamstats current=f last(overload) as prevload by host
|eval newgroup=case(isnull(prevload),1, prevload!=overload,1, true(),0)
|streamstats sum(newgroup) as groupno by host
|eventstats count as LoadDuration by host groupno
| where overload = 1 and LoadDuration >= 10
| table host _time PercentProcessorTime LoadDuration

SPL 2:

index="perfmoncpu" source="PerfmonMk:CPU" instance=_Total
| sort 0 _time
| streamstats time_window=15min avg(cpu_load_percent) as last15min_load count by host
| eval last15min_load = if (count < 90,null,round(last15min_load, 2))
| where (last15min_load) >= 90
| table host, cpu_load_percent, last15min_load

I have used count<90 as the above SPL generates a count of 90 mins throughout

Please let me know if you guys have any further questions.

Thank You!

PS: I am a newbie trying to learn splunk!

0 Karma

to4kawa
Ultra Champion

ans1:
instance=_Total is instance field has _Total string(value).

ans2:
%_Processor_Time can be used by field name.

ans3:
try both and check job inspector.

If you provide sample logs, we can make query.

0 Karma

rahulkumarfgf
Explorer

Hi @to4kawa ,
Thank you for your response. I am using both, however, am not sure what exactly to check in job inspector that will give me the idea that the SPL is correct.

Regarding logs, am trying to find a way to submit them. I will try and add a link to it.

Thank you!

0 Karma

to4kawa
Ultra Champion

Shorter elapsed times are better queries.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...