Indexer I/O latency on reads and writes

bport15 · ‎10-22-2018

I am looking to show I/O latency on our indexers specific to reads and/or writes? The Monitoring Console shows total IOPS but we'd like to go a little more granular than that. We want to know if our disk latency is because of reads or writes on our hot/warm and cold mounts.

I'm looking at the introspection logs, at the fields indicated below, and it's not clear to me if reads_kb_ps and writes_kb_ps are the fields that will provide this data , based on their descriptions.

avg_service_ms: Average time requests caused the CPU to be in use, in milliseconds.
avg_total_ms: Average queue + execution time for requests to be completed, in milliseconds.
cpu_pct: Percentage of time the CPU was servicing requests.
device: Device name (e.g., as listed under /dev on UNIX).
fs_type: Mounted device file system type.
interval: Interval over which sampling occurred, in seconds.
mount_point: Mount point(s) of the underlying device.
reads_kb_ps: Total number of kb read per second.
reads_ps: Number of read requests per second.
writes_kb_ps: Total number of kb written per second.
writes_ps: Number of write requests per second.

I've looked all over and haven't been able to find anything helpful. I feel like someone else has to be doing this type of performance metric.

This is what the Monitoring Console has as the IOPS search. How can I pick this apart to give me what I'm looking for?

index=_introspection sourcetype=splunk_resource_usage component=IOStats host=<myhost> 
| eval mount_point = 'data.mount_point' 
| eval reads_ps = 'data.reads_ps' 
| eval writes_ps = 'data.writes_ps' 
| eval interval = 'data.interval' 
| eval op_count = (reads_ps + writes_ps) * interval 
| eval avg_service_ms = 'data.avg_service_ms' 
| eval avg_wait_ms = 'data.avg_total_ms' 
| eval cpu_pct = 'data.cpu_pct' 
| eval network_pct = 'data.network_pct' 
| timechart minspan=60s partial=f per_second(op_count) as iops, avg(data.cpu_pct) as avg_cpu_pct, avg(data.avg_service_ms) as avg_service_ms, avg(data.avg_total_ms) as avg_wait_ms, avg(data.network_pct) as avg_network_pct 
| eval iops = round(iops) 
| eval avg_cpu_pct = round(avg_cpu_pct) 
| eval avg_service_ms = round(avg_service_ms) 
| eval avg_wait_ms = round(avg_wait_ms) 
| eval avg_network_pct = round(avg_network_pct) 
| fields _time, iops avg_wait_ms 
| rename avg_wait_ms as "Wait Time (ms)"

molinarf · ‎01-17-2020

@gjanders,
I found an app and associated TA, SA that provides that kind of data. Look at the Metricator app

https://splunkbase.splunk.com/app/3947/
https://splunkbase.splunk.com/app/3948/ (Technical Add On)
https://splunkbase.splunk.com/app/3949/ (Support App)

gjanders · ‎01-17-2020

I have been using this app for a long time and it's predecessor app nmon for splunk.

Great app

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

molinarf · ‎01-17-2020

Did you ever create alerts for disk performance from this app? I am trying to develop alerts based on different iostat metrics.

gjanders · ‎01-19-2020

No, I never created alerts from this one, good luck

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

ddrillic · ‎10-23-2018

What is the latency you have - is it seconds, minutes?

bport15 · ‎10-24-2018

We're seeing a wide variety. Some servers are showing 4-30ms on our hot/warm disk and other servers are showing up towards 2500ms on our hot/warm disk, with spikes above that. I haven't even looked at our cold disk yet because the majority of our splunk users are hitting the warm buckets. I'm looking to show historic values of i/o latency; not just what's currently going on.

gjanders · ‎10-22-2018

Also refer to What is the best app to monitor Linux in Splunk? , sar / iostat will work just fine but you might want to look at the linked answer so you can get this into Splunk easily and have prebuilt dashboards...

-
Alerts for Splunk Admins, Version Control for Splunk, Decrypt2 VersionControl For SplunkCloud

mpreddy · ‎10-22-2018

bport15,

try to install "sysstat" package in your linux server and check read and writes by using "sar" command

you can use "iostat" and "sar" commands to find latency.

bport15 · ‎10-24-2018

Thanks mpreddy. We can look on the box for current latency stats but I need to look at historic values, as well. Previous 6 weeks, for example. So I need to be able to graph something within Splunk.

Indexer I/O latency on reads and writes

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation

Indexer I/O latency on reads and writes

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions