Getting Data In

Need to schedule IO wait alerts on Splunk

vikram_m
Path Finder

Our Splunk infrastructure is on Azure and recently we face a major issue where I/O wait time was high and so indexing and all the data pipeline queues were effected.

Now we have decided as an RCA item to schedule the IO wait time alerts on the infrastructure so that we can know if there something wrong in our splunk config or it is an Azure storage which is piling up the data pipeline queues.

Please let us know now can we schedule IO alerts on Splunk.

Thanks.
Vikram.

0 Karma

adonio
Ultra Champion

hello there, i might be off with my answer but thought its worthwhile to bring to your attention and need the answer to post screenshots.
you can use the DMC (or MC), it has pre-built alerts on indexing queues and indexers performance, screenshot 1
also you can navigate on DMC to Resource Usage: Machine -> scroll down to see the I/O graph.
open that in search will show you the following:

 `dmc_set_index_introspection` sourcetype=splunk_resource_usage component=IOStats host=<yourHost>
              | eval mount_point = 'data.mount_point'
              | eval reads_ps = 'data.reads_ps'
              | eval writes_ps = 'data.writes_ps'
              | eval interval = 'data.interval'
              | eval op_count = (reads_ps + writes_ps) * interval
              | eval avg_service_ms = 'data.avg_service_ms'
              | eval avg_wait_ms = 'data.avg_total_ms'
              | eval cpu_pct = 'data.cpu_pct'
              | eval network_pct = 'data.network_pct' | `dmc_timechart_for_iostats` per_second(op_count) as iops, avg(data.cpu_pct) as avg_cpu_pct, avg(data.avg_service_ms) as avg_service_ms, avg(data.avg_total_ms) as avg_wait_ms, avg(data.network_pct) as avg_network_pct
                | eval iops = round(iops)
                | eval avg_cpu_pct = round(avg_cpu_pct)
                | eval avg_service_ms = round(avg_service_ms)
                | eval avg_wait_ms = round(avg_wait_ms)
                | eval avg_network_pct = round(avg_network_pct)
                | fields _time, iops avg_wait_ms
                | rename avg_wait_ms as "Wait Time"

which you can modify and use as a base to your alerts
hope it helps

screenshot 1:
alt text

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...