Monitoring Splunk

Troubleshooting High Storage I/O Saturation Spikes?

tretrigh
Path Finder

We are periodically seeing spikes of Storage I/O Saturation (Monitoring Console > Resource Usage: Deployment).  When split by host we can see that this is affecting all 6 indexers nearly simultaneously for the /opt/splunkdata mount points.  As expected, this triggers the Health Status notification throughout the day (warning or alert).

To note, Load Averages are regularly > 5% with CPU usage normally under 10% for each indexer (24 cores each).  RAM usage around 30% per indexer.  We are wondering if our physical storage and/or network might be a bottleneck or if it's something on the Splunk side.

For a Splunk Admin beginner, could someone please offer some suggestions on where we could start troubleshooting these spikes or explain in more detail the specifics around Storage I/O Saturation?

We are on Enterprise 9.0.4 across the board and considering the recent update sooner than later.

Thank you!

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @tretrigh,

usually the issue in these situations is the storage:

which kind of storage are you using?

are you sure to have at least the requested 800 IOPS from your storage?

You can measure your storage performances using a tool as Bonnie++.

Ciao.

Giuseppe

0 Karma

tretrigh
Path Finder

Storage is all SSD on NetApp using RAID-DP connected using fibre channel backend.  I'm waiting to hear more about matching up times where we're seeing spikes with the guys in Infrastructure.  I'm unsure about the IOPS  limits at this point.

To note, I learned that the OS / disk and the /splunkdata disk for each indexer are all on the same aggregate.  As I am unfamiliar with NetApp, I don't know if this matters (but assuming it is okay)?

 

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @tretrigh,

Storage on SSD should give the requested performances.

All the indexers are in the same nove or in different ones?

Are resources shared or dedicated?, they shoud be dedicated.

maybe there's an momentary issue on NetApp.

Ciao.

Giuseppe

0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...

Step into “Hunt the Insider: An Splunk ES Premier Mystery” to catch a cybercriminal ...

After a whole week of being on call, you fell asleep on your keyboard, and you hit a sequence of buttons that ...