Splunk Search

How to alert user when the Processor Time exceeds a certain limit for a given certain time

tusharsappal
Explorer

Hello ,
I want to check for whether my processor has exceeded a certain % for a certain given time and then I want to send an alert .
I have the search query in this format . Kindly guide if possible please correct the query

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" | timechart span="1s" avg(Value) | search count >10 by host

But I am not sure where to check for exceeding a certain percentage of value in this

0 Karma
1 Solution

lguinn2
Legend

Note that the perfmon data is not collected every second, so a span of "1s" in the timechart does not make sense.

What if you had a search that ran every 10 minutes and alerted for any host that had an average processor time in excess of 90%?

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" earliest=-13m latest=-3m
| stats avg(Value)  as AvgProcessorTime by host
| where AvgProcessorTime > 90

You can set the alert condition for # results > 0, as this will return one result for each host that has a high AvgProcessorTime. Note that there is a 3 minute "lag" in my search; this is to allow time for the data to be collected and returned across the environment. You can eliminate this if you want, but it will affect the accuracy.

View solution in original post

lguinn2
Legend

Note that the perfmon data is not collected every second, so a span of "1s" in the timechart does not make sense.

What if you had a search that ran every 10 minutes and alerted for any host that had an average processor time in excess of 90%?

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" earliest=-13m latest=-3m
| stats avg(Value)  as AvgProcessorTime by host
| where AvgProcessorTime > 90

You can set the alert condition for # results > 0, as this will return one result for each host that has a high AvgProcessorTime. Note that there is a 3 minute "lag" in my search; this is to allow time for the data to be collected and returned across the environment. You can eliminate this if you want, but it will affect the accuracy.

saurabh_tek
Communicator

@Iguinn this query is precise for the sudden CPU spike detection but from impact perspective, business is more interested to look at - if CPU reaches beyond threshold and stays there for 3 or more minutes which might impact server performance. any pointer for making it like that ?

rahulkumarfgf
Explorer

Hi @saurabh_tek

I am trying to find a solution to the same problem as mentioned by you. I hope you were able to resolve it. If so, could you please let me know how to handle this? There are several threads with similar questions but none of it actually worked.

Thanks!

0 Karma

tusharsappal
Explorer

Thanks for the response and the query actually did worked well. I had one more query in Mind till now I only know that Splunk only sends the count of the events happened during the time duration , is there any way we can send the actual matching content in the email whenever the alert is fired ,i.e can we make the reporting more intuitive and clear ,sending the actual matching text in the email body [not in the case of perfmon data but in the case of parsing logs ]

Thanks in Advance
Tushar

0 Karma
Get Updates on the Splunk Community!

Splunk AI Assistant for SPL | Key Use Cases to Unlock the Power of SPL

Splunk AI Assistant for SPL | Key Use Cases to Unlock the Power of SPL  The Splunk AI Assistant for SPL ...

Buttercup Games: Further Dashboarding Techniques (Part 5)

This series of blogs assumes you have already completed the Splunk Enterprise Search Tutorial as it uses the ...

Customers Increasingly Choose Splunk for Observability

For the second year in a row, Splunk was recognized as a Leader in the 2024 Gartner® Magic Quadrant™ for ...