I have started collecting Process information on a virtual machine that has 1 processor. I am seeing %_Processor_Time statistics of upwards of 1000%. I understand that you could get a percentage greater than 100% on a machine with multiple sockets/cores, but that doesn't seem to be the case here. Can anyone explain what it is that I'm seeing? Running perfmon on the VM looks normal, so how can a process use a percentage of the CPU that is so high?
This has just been fixed and available for download, in 8.1.7+, 8.2.3+, 8.3.x and afterwards.
It is found in the release notes in Fixed Issues section, for example version 8.2.3 - https://docs.splunk.com/Documentation/Splunk/8.2.3/ReleaseNotes/Fixedissues
Please check the release notes for the versions you want to upgrade to. ( SPL-210455)
I have just run into a similar issue and it may be down to a limitation in how Perfmon functions. Explanation:
Why you see data greater than expected is due to a design limitation in perflib V1/PDH. To calculate the percentage of CPU usage, PDH needs two samples (each with a raw value and a timestamp); the problem is that PDH uses only the instance name to match the processes, so it can sometimes use two samples from different processes.
For example, in the table below there are 3 processes (black, green and red). Each sample is 1 second apart, and the values represent the total number of milliseconds each process has ran for since it started. The first row indicates how PDH names the process;
X X#1 X#2
Sample 1 0 0 0
Sample 2 20 10 500
Sample 3 40 20 1000 Process in green is deleted, so process in red becomes X#1
Sample 4 60 1500 It looks like process X#1 ran for 1480s in this 1s interval!
Sample 5 80 2000 ß Everything seems back to normal now.
Ref: Info about Counter "% Processor Time"? (microsoft.com)
With a possible fix being listed here: Perfmon: Identifying processes by PID instead of instance - Microsoft Tech Community
I haven't tested the fix yet but will report back with results if it does get implemented.
The VM being monitored has 1 Core and 1 Logical Processor. The Percent of Processor Time is %17,967. Going by the NumberOfProcessors x 100 formula, the machine would need 180 sockets/cores in order to reach that number?? I could see %101 percent due to threading issues, but %18,000? The only way I could see that being accurate is if the VM had knowledge of the underlying hardware (VM Hosts) but I'm fairly sure that is not the case.
Do you have access to the actual logs? Find an event that is over 100%. If you open an event, click on the Event Actions drop down, you may be able to view source. Here is an example of one of mine:
12/24/2019 15:05:44.584 -0500
collection=Process
object=Process
counter="% Processor Time"
instance=splunk-perfmon
Value=2.9410862311424606
If your event value is different from the value in the raw log, you may have an issue with the parser.
In perfmon, go to the Performance node where it give you the overview and system summary. In the lower pane, it shows processor information. Are the columns like 0,1 and 0,1? If so, these are cores that may contribute to the measure you're seeing.
Columns are _Total, 0,_Total, and 0,0