Knowledge Management

Anyone using (disk_queue_length) instead of %_disk_time for disk metrics???

spluzer
Communicator

Hey Splunkers,

Just wondering if anyone had some cool suggestions for better disk metrics

We are currently using %_disk_time (among others) for performance monitoring for our hosts

While it may be somewhat useful to know that drives are 100% busy, that fact in and of itself, is not as useful.

I would expect that drives holding data would be 100% busy because the drive is being read from or written to almost all of the time, especially on some of our more heavily used systems

Just from my own basic knowledge, that metric combined with “Average Disk Que Length”, would be more relevant. If a disk is busy almost all of the time, and there is a large queue, the disk might be a bottleneck, and require further investigation.

However, I imagine RAID configuration needs to be factored in (which I'm not sure about) -and I'm wondering how others are doing it. Any help is much appreciated

I'm currently playing around with it like this:

index=windows sourcetype="PerfmonMk:LogicalDisk"
| stats avg(Current_Disk_Queue_Length) as average by host instance
| search average>1
| sort - average

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Splunk + ThousandEyes: Correlate frontend, app, and network data to troubleshoot ...

 Are you tired of troubleshooting delays caused by siloed frontend, application, and network data? We've got a ...

Splunk Observability for AI

Don’t miss out on an exciting Tech Talk on Splunk Observability for AI!Discover how Splunk’s agentic AI ...

🔐 Trust at Every Hop: How mTLS in Splunk Enterprise 10.0 Makes Security Simpler

From Idea to Implementation: Why Splunk Built mTLS into Splunk Enterprise 10.0  mTLS wasn’t just a checkbox ...