Getting Data In

Indexer I/O latency on reads and writes

bport15
Path Finder

I am looking to show I/O latency on our indexers specific to reads and/or writes? The Monitoring Console shows total IOPS but we'd like to go a little more granular than that. We want to know if our disk latency is because of reads or writes on our hot/warm and cold mounts.

I'm looking at the introspection logs, at the fields indicated below, and it's not clear to me if reads_kb_ps and writes_kb_ps are the fields that will provide this data , based on their descriptions.

avg_service_ms: Average time requests caused the CPU to be in use, in milliseconds.
avg_total_ms: Average queue + execution time for requests to be completed, in milliseconds.
cpu_pct: Percentage of time the CPU was servicing requests.
device: Device name (e.g., as listed under /dev on UNIX).
fs_type: Mounted device file system type.
interval: Interval over which sampling occurred, in seconds.
mount_point: Mount point(s) of the underlying device.
reads_kb_ps: Total number of kb read per second.
reads_ps: Number of read requests per second.
writes_kb_ps: Total number of kb written per second.
writes_ps: Number of write requests per second.

I've looked all over and haven't been able to find anything helpful. I feel like someone else has to be doing this type of performance metric.

This is what the Monitoring Console has as the IOPS search. How can I pick this apart to give me what I'm looking for?

index=_introspection sourcetype=splunk_resource_usage component=IOStats host=<myhost> 
| eval mount_point = 'data.mount_point' 
| eval reads_ps = 'data.reads_ps' 
| eval writes_ps = 'data.writes_ps' 
| eval interval = 'data.interval' 
| eval op_count = (reads_ps + writes_ps) * interval 
| eval avg_service_ms = 'data.avg_service_ms' 
| eval avg_wait_ms = 'data.avg_total_ms' 
| eval cpu_pct = 'data.cpu_pct' 
| eval network_pct = 'data.network_pct' 
| timechart minspan=60s partial=f per_second(op_count) as iops, avg(data.cpu_pct) as avg_cpu_pct, avg(data.avg_service_ms) as avg_service_ms, avg(data.avg_total_ms) as avg_wait_ms, avg(data.network_pct) as avg_network_pct 
| eval iops = round(iops) 
| eval avg_cpu_pct = round(avg_cpu_pct) 
| eval avg_service_ms = round(avg_service_ms) 
| eval avg_wait_ms = round(avg_wait_ms) 
| eval avg_network_pct = round(avg_network_pct) 
| fields _time, iops avg_wait_ms 
| rename avg_wait_ms as "Wait Time (ms)"
0 Karma

molinarf
Communicator

@gjanders,
I found an app and associated TA, SA that provides that kind of data. Look at the Metricator app

https://splunkbase.splunk.com/app/3947/
https://splunkbase.splunk.com/app/3948/ (Technical Add On)
https://splunkbase.splunk.com/app/3949/ (Support App)

0 Karma

gjanders
SplunkTrust
SplunkTrust

I have been using this app for a long time and it's predecessor app nmon for splunk.

Great app

0 Karma

molinarf
Communicator

Did you ever create alerts for disk performance from this app? I am trying to develop alerts based on different iostat metrics.

0 Karma

gjanders
SplunkTrust
SplunkTrust

No, I never created alerts from this one, good luck

0 Karma

ddrillic
Ultra Champion

What is the latency you have - is it seconds, minutes?

0 Karma

bport15
Path Finder

We're seeing a wide variety. Some servers are showing 4-30ms on our hot/warm disk and other servers are showing up towards 2500ms on our hot/warm disk, with spikes above that. I haven't even looked at our cold disk yet because the majority of our splunk users are hitting the warm buckets. I'm looking to show historic values of i/o latency; not just what's currently going on.

0 Karma

gjanders
SplunkTrust
SplunkTrust

Also refer to What is the best app to monitor Linux in Splunk? , sar / iostat will work just fine but you might want to look at the linked answer so you can get this into Splunk easily and have prebuilt dashboards...

0 Karma

mpreddy
Communicator

bport15,

try to install "sysstat" package in your linux server and check read and writes by using "sar" command

you can use "iostat" and "sar" commands to find latency.

0 Karma

bport15
Path Finder

Thanks mpreddy. We can look on the box for current latency stats but I need to look at historic values, as well. Previous 6 weeks, for example. So I need to be able to graph something within Splunk.

0 Karma
Get Updates on the Splunk Community!

Enterprise Security Content Update (ESCU) | New Releases

In December, the Splunk Threat Research Team had 1 release of new security content via the Enterprise Security ...

Why am I not seeing the finding in Splunk Enterprise Security Analyst Queue?

(This is the first of a series of 2 blogs). Splunk Enterprise Security is a fantastic tool that offers robust ...

Index This | What are the 12 Days of Splunk-mas?

December 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...