Alerting

Need a Splunk alert that fires when cpu % mem_use% OR disk use % >75% (while also indicating the top offending processes)

spluzer
Communicator

Hello Splunkers. Noob here. I have an alert that fires when any three metrics (listed in title) goes above 75%. I just need to add into the alert what top offending processes are causing the overages. Here is my query so far, which does work to illustrate when either 3 main metrics goes above 75%.

index=blah (sourcetype="PerfDisk" OR sourcetype="PerfCPU" OR sourcetype="PerfMem" OR sourcetype="PerfProcess") (host=blah OR host=blah OR host=blah OR host=blah ) earliest=-5m
| stats avg(%CommittedBytes) as mem_use_prcnt
avg(cpuLoadPerc) as cpu_load_prcnt
avg(%DiskTime) as disk_utilization_prcnt
by host
| eval fire_it_up = case(cpu_load_prcnt > 75,1,
mem_use_prcnt > 75,1,
disk_utilization_prcnt > 75,1,
true() ,0)
| where fire_it_up > 0
| table all three metrics

Any ideas on getting the top offending processes causing the overages???. Any help is much appreciated.

0 Karma
1 Solution

spluzer
Communicator

Here is what I ended up doing:

index=win sourcetype="Perf:logDisk" instance!=_Total (host=myhost) earliest=-5m
| eval volume = instance
| stats avg(%_Disk_Time) as diskUse% by volume host
| join type=left host
[| search index=win sourcetype="Perf:Process" category=%_Processor_Time=* NOT(instance IN(_Total, Idle)) (host=myhost) earliest=-5m
| stats avg(%_Processor_Time) as %_Processor_Time by host instance
| sort -%_Processor_Time
| streamstats count by host
| where count=1
| eval %_Processor_Time=round('%_Processor_Time')
| eval Additional_InfoCPU = "Top Resource Task=" . instance . ", Task Time=" . '%_Processor_Time'
| fields host Additional_InfoCPU ]

then repeated that for a bunch of other metrics (mem%, cpu% etc etc) in separate subsearches

Then

| eval mem_use_% = round(mem_use_%, 2)
| eval cpu_load_% = round(cpu_load_%, 2)
| eval disk_utilization_% = round(disk_utilization_%, 2)
| eval Individual_DiskUse_%t = round(Individual_DiskUse_%, 2)
| eval fire_alert = case(cpu_load_% > 75,1,
mem_use_%> 75,1,
Individual_DiskUse_%> 75,1,
true() ,0)
| where fire_Alert>0
| stats values(volume) values(DiskUse_%) by everything you want

| Table it all out

View solution in original post

0 Karma

Sukisen1981
Champion

hi @spluzer
I just need to add into the alert what top offending processes are causing the overages...well then you need to capture the process names under cpu,memory or disk . I am sure its mentiioned in your events somewhere?
you just cant go by sourcetype , all that would mean is if cpu spikes >75% we know its the PerfCPU sourcetype.
Perhaps you have more granular details than that, like under that source types which are the cpu process names?

0 Karma

spluzer
Communicator

Here is what I ended up doing:

index=win sourcetype="Perf:logDisk" instance!=_Total (host=myhost) earliest=-5m
| eval volume = instance
| stats avg(%_Disk_Time) as diskUse% by volume host
| join type=left host
[| search index=win sourcetype="Perf:Process" category=%_Processor_Time=* NOT(instance IN(_Total, Idle)) (host=myhost) earliest=-5m
| stats avg(%_Processor_Time) as %_Processor_Time by host instance
| sort -%_Processor_Time
| streamstats count by host
| where count=1
| eval %_Processor_Time=round('%_Processor_Time')
| eval Additional_InfoCPU = "Top Resource Task=" . instance . ", Task Time=" . '%_Processor_Time'
| fields host Additional_InfoCPU ]

then repeated that for a bunch of other metrics (mem%, cpu% etc etc) in separate subsearches

Then

| eval mem_use_% = round(mem_use_%, 2)
| eval cpu_load_% = round(cpu_load_%, 2)
| eval disk_utilization_% = round(disk_utilization_%, 2)
| eval Individual_DiskUse_%t = round(Individual_DiskUse_%, 2)
| eval fire_alert = case(cpu_load_% > 75,1,
mem_use_%> 75,1,
Individual_DiskUse_%> 75,1,
true() ,0)
| where fire_Alert>0
| stats values(volume) values(DiskUse_%) by everything you want

| Table it all out

0 Karma

richgalloway
SplunkTrust
SplunkTrust

@spluzer If your problem is resolved, please accept the answer to help future readers.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...