Splunk Search

How do I subtotal processor utilization?

NickJLange
Explorer

Disclaimer: I'm not saying this particular example is useful analysis - I'm just not sure how to think about solving a problem like this in Splunk properly.

I have thousands of events of Zabbix Data where socket-wide data points are normalized into individual events. i.e. system.cpu.util[socket,core,type] across heterogeneous hardware configurations (i.e. # of sockets or # of cores are different).
I want to understand the distribution of the load across a socket by machine modeltype to ensure it matches up to temperature readings - and then flag outliers (either on temperature or idle cores).

I've seen tricks around extracting the itemKey into named Variables which I think works because the timestamp is exactly the same.... but how do you run stats on variables that might not exist? (i.e. socket 4 or core 20?)

Does any of this make sense?

0 Karma

jkat54
SplunkTrust
SplunkTrust
  ... host=hostname |eval socket=if(isnull(socket),"null",socket) |  timechart avg(value) max(value) by socket

AND

  ... host=hostname | eval core=if(isnull(core),"null",core)| timechart avg(value) max(value) by core

should be fine for a host by host basis. Both would work well on a dashboard with a drop down list to select the hostname etc.

 ... |eval socket=if(isnull(socket),"null",socket) | eval core=if(isnull(core),"null",core)| stats avg(value) max(value) by host core socket

The above should be fine for an analyst to select specific time ranges with time picker and see if activity spikes occured, etc.

0 Karma

NickJLange
Explorer

Thank you for the helpful suggestion. I'm looking for more aggregate trends across a class of hosts with different underlying hardware models - which sort of precludes individual host analysis with eyeballs...

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Do provide some sample data.

0 Karma

NickJLange
Explorer

It's not very exciting (one row per pseudo-event):

_time,host,itemKey="system.cpu.util[user_utilization,#socket,#core,]",value=int
....
_time,hostN,itemKey="system.cpu.util[user_utilization,#socket,#core,]",value=int

0 Karma

NickJLange
Explorer

Currently, the query uses rex to extract the #socket/#core are extracted to new variables via Rex...

0 Karma

somesoni2
Revered Legend

What will the field value contains?

0 Karma

NickJLange
Explorer

an integer value from 1- 100. representing utilization ... the equiv of /proc/stat

0 Karma

somesoni2
Revered Legend

Is list of possible socket/core fixed?

0 Karma

NickJLange
Explorer

It is hard to predict the socket/core count... but it is a finite set.

0 Karma
Get Updates on the Splunk Community!

Splunk Answers Content Calendar, June Edition

Get ready for this week’s post dedicated to Splunk Dashboards! We're celebrating the power of community by ...

What You Read The Most: Splunk Lantern’s Most Popular Articles!

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

See your relevant APM services, dashboards, and alerts in one place with the updated ...

As a Splunk Observability user, you have a lot of data you have to manage, prioritize, and troubleshoot on a ...