Splunk Search

Getting PS data into CPU/Hours units

Engager

I'm working to deploy Splunk in an HPC environment and am trying to set up some metrics queries that I didn't see in the Splunk for *nix app. Specifically I'd like have a timechart that show cpu utilization per day for the month where the units are CPU/Hours. (ie. 1 CPU with 8 cores has 192 CPU hours per day). I'm pretty sure I need to use streamstats to get the daily values, but I'm having trouble figuring out how to get the data into the units I want.

Thanks for your insight.

Edit #1: Here's a better example of the metric I'm trying to get.
Sample system: 16 total cores
CPU-Hours per day = 16(cores)*24(hours) = 384

So if cpu.sh gives you the PercentIdle for each core at that instant you'd need to take the
time that has passed since the last measurement for a core and multiply that by the current
PercentIdle and divide by 100.

Example:
event 1: core 0 97%Idle, core 1 98%Idle time 00:00:00
event 2: core 0 97%Idle, core 1 97%Idle time 00:01:00

for core 0:
(60sec*3%)/100 = 1.8 CPU-Seconds = .0005 CPU-Hours

You'd add these values up in 1 day time spans.

edit #2
I'd be okay with an answer like this (but actually worked, this is broke). I think I was making this more complicated than it needed to be.:

index=os sourcetype=cpu host=x | eval percent_used=(100-pctIdle)|stats avg(percent_used) AS cpu_avg_used by CPU | timechart span=1d sum(((cpu_avg_used*24)/100))

Still new to writing queries.

Tags (3)
0 Karma
1 Solution

SplunkTrust
SplunkTrust

Assuming your CPU data (from cpu.sh) is like this

CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle
all       0.00       0.00       0.50       0.00      99.50
0         0.98       0.00       0.98       0.00      98.04
1         0.00       0.00       0.00       0.00     100.00 

Where 0 and 1 are the cores, try this

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = (100 - pctIdle)*24/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

Updated: to include frequency into calculation (cpu.sh is running every 5 min)

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = ((100 - pctIdle)*1/12)/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

View solution in original post

SplunkTrust
SplunkTrust

Assuming your CPU data (from cpu.sh) is like this

CPU    pctUser    pctNice  pctSystem  pctIowait    pctIdle
all       0.00       0.00       0.50       0.00      99.50
0         0.98       0.00       0.98       0.00      98.04
1         0.00       0.00       0.00       0.00     100.00 

Where 0 and 1 are the cores, try this

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = (100 - pctIdle)*24/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

Updated: to include frequency into calculation (cpu.sh is running every 5 min)

index=os sourcetype=cpu  
| multikv fields  CPU pctIdle
| search NOT CPU="all"
| eval CPU_Hours = ((100 - pctIdle)*1/12)/100 | timechart span=1d sum(CPU_Hours) as CPU_Hours by host useother=f

View solution in original post

Engager

Ok, thanks. I think I was making this harder than it was when I tried it.

0 Karma

SplunkTrust
SplunkTrust

I have not think of the frequency (stupidity I know). In my opinion, since the frequency is 5 min, the multiplication should be multiplied by 1/12 (hour equivalent of 5 min) and then they all should be added for a day to get daily total. [means, the CPU usage for a 5 min period with 80% idle means CPU was used for 1 min, and if same trend follows whole day, it would be used for 288*(1/12)*0.2=4.8 hrs. I will update the query.

0 Karma

Engager

Ah, ok. With that change then, I think this answer would give you to large a number. (ie. Say cpu.sh was run every 5 minutes and the core was pegged at 97%Idle all day. Each event would calculate CPU_Hours to be 3%*24/100 (which by itself is the right answer). But then it would add up those 288 events for the day to get that days value. The part I'm unsure of how to do is the figuring out of the time since the last event (that's why i thought streamstats might come into play).

0 Karma

SplunkTrust
SplunkTrust

My bad, it should be 24 (hours in a day) not 8. I have updated the same.

0 Karma

Engager

If each of those rows is a core, why do you need to multiply the percent NOT idle by 8? (Also, I'm not sure the units of what you calculate would be CPU Hours.) I'm working on writing up a better example. Thanks for your answer though, it's helping me think through this.

0 Karma

Engager

If the total utilzation for the day was 5%, I guess it would be .5*192

0 Karma

SplunkTrust
SplunkTrust

If 1 host has 1 CPU with 8 cores and overall CPU utilization as 5%, then you need daly average CPU utilization percent as 8*5*100/192 OR total CPU hour utilization per day as 192*5/100?

0 Karma