Troubleshooting High Storage I/O Saturation Spikes...

tretrigh · ‎06-23-2023

We are periodically seeing spikes of Storage I/O Saturation (Monitoring Console > Resource Usage: Deployment). When split by host we can see that this is affecting all 6 indexers nearly simultaneously for the /opt/splunkdata mount points. As expected, this triggers the Health Status notification throughout the day (warning or alert).

To note, Load Averages are regularly > 5% with CPU usage normally under 10% for each indexer (24 cores each). RAM usage around 30% per indexer. We are wondering if our physical storage and/or network might be a bottleneck or if it's something on the Splunk side.

For a Splunk Admin beginner, could someone please offer some suggestions on where we could start troubleshooting these spikes or explain in more detail the specifics around Storage I/O Saturation?

We are on Enterprise 9.0.4 across the board and considering the recent update sooner than later.

Thank you!

gcusello · ‎06-23-2023

Hi @tretrigh,

usually the issue in these situations is the storage:

which kind of storage are you using?

are you sure to have at least the requested 800 IOPS from your storage?

You can measure your storage performances using a tool as Bonnie++.

Ciao.

Giuseppe

tretrigh · ‎06-23-2023

Storage is all SSD on NetApp using RAID-DP connected using fibre channel backend. I'm waiting to hear more about matching up times where we're seeing spikes with the guys in Infrastructure. I'm unsure about the IOPS limits at this point.

To note, I learned that the OS / disk and the /splunkdata disk for each indexer are all on the same aggregate. As I am unfamiliar with NetApp, I don't know if this matters (but assuming it is okay)?

gcusello · ‎06-23-2023

Hi @tretrigh,

Storage on SSD should give the requested performances.

All the indexers are in the same nove or in different ones?

Are resources shared or dedicated?, they shoud be dedicated.

maybe there's an momentary issue on NetApp.

Ciao.

Giuseppe

Troubleshooting High Storage I/O Saturation Spikes?

indexing performance

monitoring console

resource usage

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

[Puzzles] Solve, Learn, Repeat: Character substitutions with Regular Expressions

Splunk Community Badges!

[Puzzles] Solve, Learn, Repeat: Matching cron expressions

Join the Conversation