Splunk Search

How to alert user when the Processor Time exceeds a certain limit for a given certain time

tusharsappal
Explorer

Hello ,
I want to check for whether my processor has exceeded a certain % for a certain given time and then I want to send an alert .
I have the search query in this format . Kindly guide if possible please correct the query

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" | timechart span="1s" avg(Value) | search count >10 by host

But I am not sure where to check for exceeding a certain percentage of value in this

0 Karma
1 Solution

lguinn2
Legend

Note that the perfmon data is not collected every second, so a span of "1s" in the timechart does not make sense.

What if you had a search that ran every 10 minutes and alerted for any host that had an average processor time in excess of 90%?

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" earliest=-13m latest=-3m
| stats avg(Value)  as AvgProcessorTime by host
| where AvgProcessorTime > 90

You can set the alert condition for # results > 0, as this will return one result for each host that has a high AvgProcessorTime. Note that there is a 3 minute "lag" in my search; this is to allow time for the data to be collected and returned across the environment. You can eliminate this if you want, but it will affect the accuracy.

View solution in original post

lguinn2
Legend

Note that the perfmon data is not collected every second, so a span of "1s" in the timechart does not make sense.

What if you had a search that ran every 10 minutes and alerted for any host that had an average processor time in excess of 90%?

index=windows sourcetype="perfmon:cputime" counter="% Processor Time" earliest=-13m latest=-3m
| stats avg(Value)  as AvgProcessorTime by host
| where AvgProcessorTime > 90

You can set the alert condition for # results > 0, as this will return one result for each host that has a high AvgProcessorTime. Note that there is a 3 minute "lag" in my search; this is to allow time for the data to be collected and returned across the environment. You can eliminate this if you want, but it will affect the accuracy.

saurabh_tek
Communicator

@Iguinn this query is precise for the sudden CPU spike detection but from impact perspective, business is more interested to look at - if CPU reaches beyond threshold and stays there for 3 or more minutes which might impact server performance. any pointer for making it like that ?

rahulkumarfgf
Explorer

Hi @saurabh_tek

I am trying to find a solution to the same problem as mentioned by you. I hope you were able to resolve it. If so, could you please let me know how to handle this? There are several threads with similar questions but none of it actually worked.

Thanks!

0 Karma

tusharsappal
Explorer

Thanks for the response and the query actually did worked well. I had one more query in Mind till now I only know that Splunk only sends the count of the events happened during the time duration , is there any way we can send the actual matching content in the email whenever the alert is fired ,i.e can we make the reporting more intuitive and clear ,sending the actual matching text in the email body [not in the case of perfmon data but in the case of parsing logs ]

Thanks in Advance
Tushar

0 Karma
Get Updates on the Splunk Community!

Uncovering Multi-Account Fraud with Splunk Banking Analytics

Last month, I met with a Senior Fraud Analyst at a nationally recognized bank to discuss their recent success ...

Secure Your Future: A Deep Dive into the Compliance and Security Enhancements for the ...

What has been announced?  In the blog, “Preparing your Splunk Environment for OpensSSL3,”we announced the ...

New This Month in Splunk Observability Cloud - Synthetic Monitoring updates, UI ...

This month, we’re delivering several platform, infrastructure, application and digital experience monitoring ...