Splunk Search

Getting PS data into CPU/Hours units

pollockm
Engager

I'm working to deploy Splunk in an HPC environment and am trying to set up some metrics queries that I didn't see in the Splunk for *nix app. Specifically I'd like have a timechart that show cpu utilization per day for the month where the units are CPU/Hours. (ie. 1 CPU with 8 cores has 192 CPU hours per day). I'm pretty sure I need to use streamstats to get the daily values, but I'm having trouble figuring out how to get the data into the units I want.

Thanks for your insight.

Edit #1: Here's a better example of the metric I'm trying to get.
Sample system: 16 total cores
CPU-Hours per day = 16(cores)*24(hours) = 384

So if cpu.sh gives you the PercentIdle for each core at that instant you'd need to take the
time that has passed since the last measurement for a core and multiply that by the current
PercentIdle and divide by 100.

Example:
event 1: core 0 97%Idle, core 1 98%Idle time 00:00:00
event 2: core 0 97%Idle, core 1 97%Idle time 00:01:00

for core 0:
(60sec*3%)/100 = 1.8 CPU-Seconds = .0005 CPU-Hours

You'd add these values up in 1 day time spans.

edit #2
I'd be okay with an answer like this (but actually worked, this is broke). I think I was making this more complicated than it needed to be.:

index=os sourcetype=cpu host=x | eval percent_used=(100-pctIdle)|stats avg(percent_used) AS cpu_avg_used by CPU | timechart span=1d sum(((cpu_avg_used*24)/100))

Still new to writing queries.

Tags (3)
0 Karma
1 Solution

somesoni2
Revered Legend

Assuming your CPU data (from cpu.sh) is like this

CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle
all       0.00       0.00       0.50       0.00      99.50
0         0.98       0.00       0.98       0.00      98.04
1         0.00       0.00       0.00       0.00     100.00 

Where 0 and 1 are the cores, try this

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = (100 - pctIdle)*24/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

Updated: to include frequency into calculation (cpu.sh is running every 5 min)

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = ((100 - pctIdle)*1/12)/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

View solution in original post

somesoni2
Revered Legend

Assuming your CPU data (from cpu.sh) is like this

CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle
all       0.00       0.00       0.50       0.00      99.50
0         0.98       0.00       0.98       0.00      98.04
1         0.00       0.00       0.00       0.00     100.00 

Where 0 and 1 are the cores, try this

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = (100 - pctIdle)*24/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

Updated: to include frequency into calculation (cpu.sh is running every 5 min)

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = ((100 - pctIdle)*1/12)/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

pollockm
Engager

Ok, thanks. I think I was making this harder than it was when I tried it.

0 Karma

somesoni2
Revered Legend

I have not think of the frequency (stupidity I know). In my opinion, since the frequency is 5 min, the multiplication should be multiplied by 1/12 (hour equivalent of 5 min) and then they all should be added for a day to get daily total. [means, the CPU usage for a 5 min period with 80% idle means CPU was used for 1 min, and if same trend follows whole day, it would be used for 288*(1/12)*0.2=4.8 hrs. I will update the query.

0 Karma

pollockm
Engager

Ah, ok. With that change then, I think this answer would give you to large a number. (ie. Say cpu.sh was run every 5 minutes and the core was pegged at 97%Idle all day. Each event would calculate CPU_Hours to be 3%*24/100 (which by itself is the right answer). But then it would add up those 288 events for the day to get that days value. The part I'm unsure of how to do is the figuring out of the time since the last event (that's why i thought streamstats might come into play).

0 Karma

somesoni2
Revered Legend

My bad, it should be 24 (hours in a day) not 8. I have updated the same.

0 Karma

pollockm
Engager

If each of those rows is a core, why do you need to multiply the percent NOT idle by 8? (Also, I'm not sure the units of what you calculate would be CPU Hours.) I'm working on writing up a better example. Thanks for your answer though, it's helping me think through this.

0 Karma

pollockm
Engager

If the total utilzation for the day was 5%, I guess it would be .5*192

0 Karma

somesoni2
Revered Legend

If 1 host has 1 CPU with 8 cores and overall CPU utilization as 5%, then you need daly average CPU utilization percent as 8*5*100/192 OR total CPU hour utilization per day as 192*5/100?

0 Karma
Get Updates on the Splunk Community!

3 Ways to Make OpenTelemetry Even Better

My role as an Observability Specialist at Splunk provides me with the opportunity to work with customers of ...

What's New in Splunk Cloud Platform 9.2.2406?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.2.2406 with many ...

Enterprise Security Content Update (ESCU) | New Releases

In August, the Splunk Threat Research Team had 3 releases of new security content via the Enterprise Security ...