Splunk Search

Splunk search alert to check if CPU utilization is high and send emails if bad samples 5 of 6 intervals met?

Satheesh_red
Path Finder

Hi,

Alert Query to monitor CPU usage every 5 minutes and send an email if it matches 5 of 6 bad samples (i.e., if my CPU utilization is greater than 95% for 5 out of 6 intervals (each interval with a 5-minute gap), we need to trigger emails with High importance. 

Here, we are failing to query to check the 5 of 6 bad samples. Please assist me in getting out of this situation.

Satheesh_red_1-1689676256056.png

 

 

 

Labels (1)
Tags (3)
0 Karma
1 Solution

PickleRick
SplunkTrust
SplunkTrust

That's a completely different story. And more complicated.

The 'head' command only shows first N results, it doesn't distinguish by any field so you have to count the actual results and limit them in other way.

| streamstats count by host
| where count<=6

That's how you get your events to be statsed (you don't do the "head" command now!)

Then you can do your stats by each host

| stats count(eval(CpuUsage > 85)) as count by host

And you can add values of your CpuUsages to that command. So instead of the last line you can do

| stats count(eval(CpuUsage > 85)) as count values(CpuUsage) as CpuUsage by host

View solution in original post

PickleRick
SplunkTrust
SplunkTrust

The question is - do you have data in your Splunk instance to find that? And is this data of sufficient quality?

Show us a sample.

0 Karma

Satheesh_red
Path Finder

Hi @PickleRick 

Yes, we have data for months, and it is sufficient and accurate, as CPU data is loading to our instance from a number of systems every 5 minutes.

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. Without digging too deply about being more precise about requirements. Assuming that you want to alert if at least 5 out of 6 subsequent measurements are "bad", you can do this in two different ways.

1) Assuming that you want only latest state. You need to search over at least half an hour into the past (6*5m=30m)

<your initial search> earliest=-35m
| head 6
| stats count(eval(your_condition_for_cpu_utilization_here)) as count
| eval result=if(count>=5,"High utilization","Normal")

2) Assuming that you want to track it over a longer period

<your initial search> earliest=some_longer_time_ago
| streamstats window=6 count(eval(your_condition_for_cpu_utilization_here)) as count
| eval result=if(count>=5,"High utilization","Normal")

You might add more logic to split it by host or something like that.

Satheesh_red
Path Finder

thanks @PickleRick for helping out here. 

According to the suggestion you provide below I constructed the following query shown below.

However, I need to fulfill below requirements, please let me know.

Q1: - How can I verify this by the host.

Q2: - head 6 (each time it should check for a 5-minute range event)?

Q3:- Can we able to display the results of those last 5 intervals cpu percentage values in a new column ? ex:- 86, 92, 89,45,99,90

index=* sourcetype=cpu host=* earliest=-35m | rename "%_Idle_Time" as Percent_Idle_Time | eval CpuUsage=coalesce(100-Percent_Idle_Time,100-PercentIdleTime) | head 6 | stats count(eval(CpuUsage > 85)) as count
| eval result=if(count>=5,"High utilization","Normal")

 

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

That's a completely different story. And more complicated.

The 'head' command only shows first N results, it doesn't distinguish by any field so you have to count the actual results and limit them in other way.

| streamstats count by host
| where count<=6

That's how you get your events to be statsed (you don't do the "head" command now!)

Then you can do your stats by each host

| stats count(eval(CpuUsage > 85)) as count by host

And you can add values of your CpuUsages to that command. So instead of the last line you can do

| stats count(eval(CpuUsage > 85)) as count values(CpuUsage) as CpuUsage by host

Satheesh_red
Path Finder

Hi @PickleRick 

Do you know a trick for averaging the CPU values from the recent six events? I'm trying to produce the query below, but avg(values(CpuUsage)) isn't working.

 

index=* sourcetype=cpu CPU=all host=* earliest=-35m | rename "%_Idle_Time" as Percent_Idle_Time | eval CpuUsage=coalesce(100-Percent_Idle_Time,100-PercentIdleTime) | streamstats count by host | where count<=6 | stats avg(values(CpuUsage)) as "Average of CpuUsage last 6 intervals(5mins range)" by host

 

Regards,
Satheesh

 

 

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

You can't do "avg(values(X))". values() will produce a multivalued field. Why not just avg(CpuUsage)?

Satheesh_red
Path Finder

thanks @PickleRick 

avg(CpuUsage) worked. 

 

 

0 Karma

Satheesh_red
Path Finder

Thank you so much @PickleRick  for your help. it's worked for my requirement now.

Appreciated your time and support. 

0 Karma
Get Updates on the Splunk Community!

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...

Cloud Platform & Enterprise: Classic Dashboard Export Feature Deprecation

As of Splunk Cloud Platform 9.3.2408 and Splunk Enterprise 9.4, classic dashboard export features are now ...