Getting Data In

Need to schedule IO wait alerts on Splunk

vikram_m
Path Finder

Our Splunk infrastructure is on Azure and recently we face a major issue where I/O wait time was high and so indexing and all the data pipeline queues were effected.

Now we have decided as an RCA item to schedule the IO wait time alerts on the infrastructure so that we can know if there something wrong in our splunk config or it is an Azure storage which is piling up the data pipeline queues.

Please let us know now can we schedule IO alerts on Splunk.

Thanks.
Vikram.

0 Karma

adonio
Ultra Champion

hello there, i might be off with my answer but thought its worthwhile to bring to your attention and need the answer to post screenshots.
you can use the DMC (or MC), it has pre-built alerts on indexing queues and indexers performance, screenshot 1
also you can navigate on DMC to Resource Usage: Machine -> scroll down to see the I/O graph.
open that in search will show you the following:

 `dmc_set_index_introspection` sourcetype=splunk_resource_usage component=IOStats host=<yourHost>
              | eval mount_point = 'data.mount_point'
              | eval reads_ps = 'data.reads_ps'
              | eval writes_ps = 'data.writes_ps'
              | eval interval = 'data.interval'
              | eval op_count = (reads_ps + writes_ps) * interval
              | eval avg_service_ms = 'data.avg_service_ms'
              | eval avg_wait_ms = 'data.avg_total_ms'
              | eval cpu_pct = 'data.cpu_pct'
              | eval network_pct = 'data.network_pct' | `dmc_timechart_for_iostats` per_second(op_count) as iops, avg(data.cpu_pct) as avg_cpu_pct, avg(data.avg_service_ms) as avg_service_ms, avg(data.avg_total_ms) as avg_wait_ms, avg(data.network_pct) as avg_network_pct
                | eval iops = round(iops)
                | eval avg_cpu_pct = round(avg_cpu_pct)
                | eval avg_service_ms = round(avg_service_ms)
                | eval avg_wait_ms = round(avg_wait_ms)
                | eval avg_network_pct = round(avg_network_pct)
                | fields _time, iops avg_wait_ms
                | rename avg_wait_ms as "Wait Time"

which you can modify and use as a base to your alerts
hope it helps

screenshot 1:
alt text

0 Karma
Career Survey
First 500 qualified respondents will receive a $20 gift card! Tell us about your professional Splunk journey.

Can’t make it to .conf25? Join us online!

Get Updates on the Splunk Community!

What Is Splunk? Here’s What You Can Do with Splunk

Hey Splunk Community, we know you know Splunk. You likely leverage its unparalleled ability to ingest, index, ...

Level Up Your .conf25: Splunk Arcade Comes to Boston

With .conf25 right around the corner in Boston, there’s a lot to look forward to — inspiring keynotes, ...

Manual Instrumentation with Splunk Observability Cloud: How to Instrument Frontend ...

Although it might seem daunting, as we’ve seen in this series, manual instrumentation can be straightforward ...