Getting Data In

Need to schedule IO wait alerts on Splunk

vikram_m
Path Finder

Our Splunk infrastructure is on Azure and recently we face a major issue where I/O wait time was high and so indexing and all the data pipeline queues were effected.

Now we have decided as an RCA item to schedule the IO wait time alerts on the infrastructure so that we can know if there something wrong in our splunk config or it is an Azure storage which is piling up the data pipeline queues.

Please let us know now can we schedule IO alerts on Splunk.

Thanks.
Vikram.

0 Karma

adonio
Ultra Champion

hello there, i might be off with my answer but thought its worthwhile to bring to your attention and need the answer to post screenshots.
you can use the DMC (or MC), it has pre-built alerts on indexing queues and indexers performance, screenshot 1
also you can navigate on DMC to Resource Usage: Machine -> scroll down to see the I/O graph.
open that in search will show you the following:

 `dmc_set_index_introspection` sourcetype=splunk_resource_usage component=IOStats host=<yourHost>
              | eval mount_point = 'data.mount_point'
              | eval reads_ps = 'data.reads_ps'
              | eval writes_ps = 'data.writes_ps'
              | eval interval = 'data.interval'
              | eval op_count = (reads_ps + writes_ps) * interval
              | eval avg_service_ms = 'data.avg_service_ms'
              | eval avg_wait_ms = 'data.avg_total_ms'
              | eval cpu_pct = 'data.cpu_pct'
              | eval network_pct = 'data.network_pct' | `dmc_timechart_for_iostats` per_second(op_count) as iops, avg(data.cpu_pct) as avg_cpu_pct, avg(data.avg_service_ms) as avg_service_ms, avg(data.avg_total_ms) as avg_wait_ms, avg(data.network_pct) as avg_network_pct
                | eval iops = round(iops)
                | eval avg_cpu_pct = round(avg_cpu_pct)
                | eval avg_service_ms = round(avg_service_ms)
                | eval avg_wait_ms = round(avg_wait_ms)
                | eval avg_network_pct = round(avg_network_pct)
                | fields _time, iops avg_wait_ms
                | rename avg_wait_ms as "Wait Time"

which you can modify and use as a base to your alerts
hope it helps

screenshot 1:
alt text

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...