Monitoring Splunk

Investigating high IO on indexers

jarush
Explorer

We've recently migrated from 12 indexers per site on a slower storage array to 24 indexers per site on much faster storage arrays. Since the move we have seen IO throughput on indexer luns peak at around 6 - 8 GB/s, per site - for anywhere between 5 and 30 minutes. When that happens we start getting throttled by the storage array and latency goes up (as expected). We'd like to dig into the queries that are running at this time and see if we can do something about them (delete them, rewrite them, add datamodels, etc).

It's pretty easy to query the _internal index for sourcetype=scheduler and look at runtimes, etc. However, that doesn't give us an indication of how many buckets or slices were required to be examined by the indexers in order to satisfy the search.

Does anyone have recommendations, example searches, etc, that we can use to dig into this?

Labels (1)
Tags (2)
1 Solution

gjanders
SplunkTrust
SplunkTrust

In Alerts for Splunk Admins or github there are a few dashboards:
troubleshooting_indexer_cpu.xml
troubleshooting_resource_usage_per_user.xml

Or report wise summary / metrics searches:
SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search

Would give you similar info but its more designed to output to a metrics index for later use...

View solution in original post

gjanders
SplunkTrust
SplunkTrust

In Alerts for Splunk Admins or github there are a few dashboards:
troubleshooting_indexer_cpu.xml
troubleshooting_resource_usage_per_user.xml

Or report wise summary / metrics searches:
SearchHeadLevel - platform_stats.user_stats.introspection metrics populating search

Would give you similar info but its more designed to output to a metrics index for later use...

jarush
Explorer

We have the app installed - i'm not really seeing anything in there that would help me drill into searches that are consuming the most IO. Is there a particular one you are thinking of?

0 Karma

gjanders
SplunkTrust
SplunkTrust

Total read mb for example in the mentioned dashboard is coming from the introspection logs and related to I/O. Although this will measure from searches not ingestion

jarush
Explorer

Thanks, this got me going in the right direction. This ended up being the Splunk CS Toolkit app in the DMC. There were two queries that were destroying our storage: Splunk_index_lookup_genator and sta_forwarder_inventory. Using the below query we found they were doing two orders of magnitude more IO than all other queries:
index=_introspection host=* source=/resource_usage.log component=PerProcess data.process_type="search"
| stats sum(data.read_mb) by data.search_props.app, data.search_props.label

gjanders
SplunkTrust
SplunkTrust
0 Karma

satyenshahusda
Engager

Curious what version Splunk are you running? We recently had I/O issues on 8.0.1. Started spontaneously a couple of weekends ago.
Restarting individual indexers resolved it.

jarush
Explorer

7.3.3 across the board

0 Karma

ivanreis
Builder

please verify the array IOPS specification, according to splunk docs, each drive should have 200 average IOPS. The configuration of the disks have to be Disks (RAID) 1+0 fault tolerance scheme as the disk
Here is the document with more information -> https://docs.splunk.com/Documentation/Splunk/8.0.2/Capacity/Referencehardware#Disk_subsystem

Check this steps to troubleshoot the index performance issues
https://docs.splunk.com/Documentation/Splunk/8.0.2/Troubleshooting/Troubleshootindexingperformance

and

https://docs.splunk.com/Documentation/Splunk/8.0.2/Troubleshooting/Troubleshootingeventsindexingdela...

0 Karma
Get Updates on the Splunk Community!

Earn a $35 Gift Card for Answering our Splunk Admins & App Developer Survey

Survey for Splunk Admins and App Developers is open now! | Earn a $35 gift card!      Hello there,  Splunk ...

Continuing Innovation & New Integrations Unlock Full Stack Observability For Your ...

You’ve probably heard the latest about AppDynamics joining the Splunk Observability portfolio, deepening our ...

Monitoring Amazon Elastic Kubernetes Service (EKS)

As we’ve seen, integrating Kubernetes environments with Splunk Observability Cloud is a quick and easy way to ...