Splunk Search

Windows Server - High CPU for process - Splunk Cloud

dkgs
Communicator

Hello,

We need to find the highest CPU consumed Process in the windows machine, not the total highest cpu.

Please help how to implement the same.  Is the PercentProcessorTime is the field to be considered  for the splunk query or how we can calculate the same

 

[WMI:ProcessesCPU]
interval = 60
wql = SELECT Name, PercentProcessorTime, PercentPrivilegedTime, PercentUserTime, ThreadCount FROM Win32_PerfFormattedData_PerfProc_Process WHERE PercentProcessorTime>0
disabled = 0

Below query is not giving exact output,  its giving sum of all processes above 100. We need to find out the process which uses highest cpu

index="index1" host=windows2 source="WMI:ProcessesCPU" | WHERE NOT Name="_Total" | WHERE NOT Name="System" | WHERE NOT Name="Idle" | streamstats dc(_time) as distinct_times | head (distinct_times == 1) | stats latest(PercentProcessorTime) as CPU% by Name | sort -ProcessorTime |eval AlertStatus=if('CPU%'> 90, "Alert", "Ignore") |search AlertStatus="Alert"

 

 

Labels (4)
0 Karma

Richfez
SplunkTrust
SplunkTrust

Right.

Your search does exactly what you describe.  though I have no idea what the streamstats/head sequence is doing there in the middle.

index="index1" host=windows2 source="WMI:ProcessesCPU" 
| WHERE NOT Name="_Total" | WHERE NOT Name="System" | WHERE NOT Name="Idle" 
| streamstats dc(_time) as distinct_times | head (distinct_times == 1) 
| stats latest(PercentProcessorTime) as CPU% by Name 
| sort -ProcessorTime 
| eval AlertStatus=if('CPU%'> 90, "Alert", "Ignore") 
| search AlertStatus="Alert"

 

So let's just start from the stats forward.

The stats right in the middle is looking at the latest PercentProcessorTime by Name.  So for each "Name" get the latest PercentProcessorTime, and call it "CPU%".  Which I'd not rename it there just because the "%" symbol makes things harder to work with.

Then you sort by ... a field that doesn't exist.  There is no "ProcessorTime" field, you can prove this to yourself by temporarily removing all SPL after the stats, and just looking.

Finally, you create a new field called "AlertStatus" that's "Alert" if the CPU% was above 90, or "Ignore" if it wasn't, and you search for where your new field is "Alert".

So to show the top one...

... (whatever all that code before this does)
| stats latest(PercentProcessorTime) as PercentProcessorTime by Name 
| sort 0 - PercentProcessorTime
| head 1
| rename PercentProcessorTime AS CPU%

 

Or something like that.  It might take minor tweaking, but that should get you on the right path.

-rich

 

 

 

 

0 Karma

dkgs
Communicator

@Richfez  Thank you for the response. But I tried with the same query which you have shared but certain single process only its showing above 100 which is not correct. If you could help with the correct formula to calculate the cpu of each process it would be great

Thanks in advance

0 Karma

Richfez
SplunkTrust
SplunkTrust

Here, try this one.  Obviously, change the index= and source and whatever to match yours. And notice there's TWO places to change it now.  You'll see them if you look.

source="WMI:WMITest" host="LAPTOP-OTP637UT" index="windows" PercentProcessorTime>0 NOT Name IN ("Idle", "_Total", "System") 
    [ search source="WMI:WMITest" host="LAPTOP-OTP637UT" index="windows"
    | stats max(_time) as _time 
    | eval earliest = _time - 15 | fields earliest
    | format ] 
| stats latest(_time) as _time, latest(PercentProcessorTime) as CPU by Name 
| sort 0 - CPU | head 1
| rename CPU AS CPU%

I know for a fact it works, I have proof.  Here's my latest peak CPU user.

Name		_time					CPU%
firefox#7	2020-08-27 11:05:19.719	11

(I doubt the columns will stay lined up, but I tried...  anyway, 11% CPU.)

I redid your entire search.

1) Switched your `| where NOT ...` portions into a single `NOT Name IN (<list of items>)`.  Way easier to read this way.

2) Rebuilt what I think the streamstats was doing (picking the latest timeframe of when this was run?) into a subsearch that finds the latest time you have matching data for, altering the "earliest" field back 15 seconds from that, and which then sends that into the main search as a filter.  I'll mention more on this later.

3) then I just do more or less like I suggested - stats latest(PercentProcessorTime) by Name, then a bit of filtering/cleanup afterwards.  You can see the interim "all the results" by just removing those last three commands, the sort, head and rename.  That will give you the list of all the CPU using items from that last data collection so you can confirm it's doing it the right way.

 

So, explanation and the shortcomings of the subsearch I did-

It looks in the index (again change that!) for the latest time you have data matching what you need.  In the subsearch, you'll notice it doesn't filter out "Idle" and others - because it doesn't matter, if everything was 0% CPU the last time it took a measurement, you still want to return that little time slice of when the measurements were taken!.

Anyway, it takes that most recent _time, subtracts 15 seconds from it, and because it's a subsearch, sends that back into the main search as "earliest=<whatevertime>".

So, this assumes that the data collection runs in its entirety in under 15 seconds, and that each time it collects is separated by *more* than 15 seconds.  E.g. You are running it every 60 seconds (that's what your config says), and that by 15 seconds after it starts collecting the first data from any particular once-per-minute run, it's collected all the data from that run.  Like it starts at 4 minutes and 12 seconds past the hour this time, so it needs to be done by 4 minutes and 27 seconds after the hour.  You can expand that to 30 or maybe even 45 seconds if you need to?  If you have to go past that, we'll need to adjust so that it does a big list of times by host, so you collect the latest measurements per host instead of one overall.  Not hard, but .. might impact other things so let's not do this unless we have to.  🙂

 

If this is not your answer, ... then there's a communication problem and the question I'm hearing you ask isn't the question you are trying to ask.  In which case... just try explaining again with maybe a few small examples?

If this does get you where you need to be, please accept it as a solution so later folks who wander across this answer will know it worked!

Thanks, and happy Splunking!

Rich

 

-Rich

 

0 Karma

Richfez
SplunkTrust
SplunkTrust

If you could provide some sample data, and what it actually looks like after the manipulations, that would be great.

Specifically, what does a few results from this look like?

 

index="index1" host=windows2 source="WMI:ProcessesCPU" | WHERE NOT Name="_Total" | WHERE NOT Name="System" | WHERE NOT Name="Idle" 

 

 

I'm positive once I see that data I'll be able to get you a better answer (one with the probable facepalm removed. 🙂 )

 

Y'know, .... I could just set up that same input on my laptop.  Hmm.   Then I could see it for sure.  🙂

EDIT: (Oh, eww.  WMI.  Maybe not directly then.  Still, if you can provide data, it won't matter. )

-Rich

0 Karma
Get Updates on the Splunk Community!

Take Your Breath Away with Splunk Risk-Based Alerting (RBA)

WATCH NOW!The Splunk Guide to Risk-Based Alerting is here to empower your SOC like never before. Join Haylee ...

SignalFlow: What? Why? How?

What is SignalFlow? Splunk Observability Cloud’s analytics engine, SignalFlow, opens up a world of in-depth ...

Federated Search for Amazon S3 | Key Use Cases to Streamline Compliance Workflows

Modern business operations are supported by data compliance. As regulations evolve, organizations must ...