Splunk Search

How do I subtotal processor utilization?

NickJLange
Explorer

Disclaimer: I'm not saying this particular example is useful analysis - I'm just not sure how to think about solving a problem like this in Splunk properly.

I have thousands of events of Zabbix Data where socket-wide data points are normalized into individual events. i.e. system.cpu.util[socket,core,type] across heterogeneous hardware configurations (i.e. # of sockets or # of cores are different).
I want to understand the distribution of the load across a socket by machine modeltype to ensure it matches up to temperature readings - and then flag outliers (either on temperature or idle cores).

I've seen tricks around extracting the itemKey into named Variables which I think works because the timestamp is exactly the same.... but how do you run stats on variables that might not exist? (i.e. socket 4 or core 20?)

Does any of this make sense?

0 Karma

jkat54
SplunkTrust
SplunkTrust
  ... host=hostname |eval socket=if(isnull(socket),"null",socket) |  timechart avg(value) max(value) by socket

AND

  ... host=hostname | eval core=if(isnull(core),"null",core)| timechart avg(value) max(value) by core

should be fine for a host by host basis. Both would work well on a dashboard with a drop down list to select the hostname etc.

 ... |eval socket=if(isnull(socket),"null",socket) | eval core=if(isnull(core),"null",core)| stats avg(value) max(value) by host core socket

The above should be fine for an analyst to select specific time ranges with time picker and see if activity spikes occured, etc.

0 Karma

NickJLange
Explorer

Thank you for the helpful suggestion. I'm looking for more aggregate trends across a class of hosts with different underlying hardware models - which sort of precludes individual host analysis with eyeballs...

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Do provide some sample data.

0 Karma

NickJLange
Explorer

It's not very exciting (one row per pseudo-event):

_time,host,itemKey="system.cpu.util[user_utilization,#socket,#core,]",value=int
....
_time,hostN,itemKey="system.cpu.util[user_utilization,#socket,#core,]",value=int

0 Karma

NickJLange
Explorer

Currently, the query uses rex to extract the #socket/#core are extracted to new variables via Rex...

0 Karma

somesoni2
Revered Legend

What will the field value contains?

0 Karma

NickJLange
Explorer

an integer value from 1- 100. representing utilization ... the equiv of /proc/stat

0 Karma

somesoni2
Revered Legend

Is list of possible socket/core fixed?

0 Karma

NickJLange
Explorer

It is hard to predict the socket/core count... but it is a finite set.

0 Karma
Get Updates on the Splunk Community!

Splunk Enterprise Security: Your Command Center for PCI DSS Compliance

Every security professional knows the drill. The PCI DSS audit is approaching, and suddenly everyone's asking ...

Developer Spotlight with Guilhem Marchand

From Splunk Engineer to Founder: The Journey Behind TrackMe    After spending over 12 years working full time ...

Cisco Catalyst Center Meets Splunk ITSI: From 'Payments Are Down' to Root Cause in ...

The Problem: When Networks and Services Don't Talk Payment systems fail at a retail location. Customers are ...