We are collecting windows performance logs once every two minutes to check if a server goes over 90% of the CPU usage. We are trying to track the number of minutes the values was above 90%.
the events look like this
02/25/2019_14:12:37.949_-0500 collection=CPU object=Processor counter="%_Processor_Time" instance=1 Value=98.396306059935881194
We have tried
index=perfmon host=* object="processor" counter="%_processor_time" instance = * | transaction host startswith=Value>90 endswith=Value<=90 |eval _duration = duration/60 | stats values(collection) values(_duration) as time_duration values(instance) by _time, host
our results look like
We want to know for how long is the value above 90%. We are not able to generate a dashboard which says the total duration for which the values goes over 90 and comes back below 90%. The duration in our stats table does not show the total time for which the value was above 90%.
If you are just looking for a static count of minutes for which the CPU crossed 90%, I think you can try this:
index=perfmon host=* object="processor" counter="%_processor_time" instance = * | eval minutes_above_90_pct = if(value>=90,2,0) | stats sum(minutes_above_90_pct) by host
I have used 2 in if condition as you are collecting data for every 2 min (assuming the CPU > 90% for those 2 min)
Let me know if it works
Actually it should be capital V in Values, can you try this:
index=perfmon host= object="processor" counter="%_processor_time" instance =1 | eval minutes_above_90_pct = if(Value>=90,2,0) | stats sum(minutes_above_90_pct) as sum_minutes by host
Coming to displaying in dashboard, if frequency of query is 2 min and search range is last 10 min, you can do this:
....... | where sum_minutes = 10 | fields host
If your search range is more than 10 min, filter the events for last 10 min and run stats sum
The query worked but it displays the number of total times the value went above 90%. We are trying to get the duration for which the value was above 90%. We were able to get the dashboard like the image, but since we used transaction command it is comparing every occurrence of the value to the last occurrence. We are looking to get a number for each consecutive occurrence like 6mins, 10mins or so on for as long the event is at 90%.
Ok, so I have used a transaction command to group all events with CPU usage >=90%. And finally using where command filtering out transactions that were less than 10 min duration. Can you try this and let me know if this is what you are looking for:
index=perfmon host= object="processor" counter="%_processor_time" instance =1 | eval minutes_above_90_pct = if(Value>=90,2,0) | transaction host minutes_above_90_pct startswith=eval(minutes_above_90_pct==2) endswith=eval(minutes_above_90_pct==0) | where duration > 600
No results were found. We are not specifically looking for a an exact number per se 10mins. we are looking to get a dynamic dashboard which tell the amount of time its value is over 90%. It can be any amount of time. we want to calculate the number of minutes it is over 90%.
@cvssravan we re not looking for a single static number for the entire time period. We are looking for somethig like this:
If the Value goes above 90% consecutively for 10mins, we would like to get a dashboard which populates the number 10. and resets once it goes below 90%.