Deployment Architecture

Disk IO plumbing tests for Splunk Infrastructure

cameronjust
Path Finder

Hi All,

I've been constantly battling customer environments where they don't follow best practice guidelines for their infrastructure. Usually this is always when they deploy in VMWare environments. The most notable issues are assigning not enough CPU cores and not pinning them as recommended in the VMWare whitepaper.

However another common issue I have encountered recently is substandard disk IO. Splunk really just recommends minimum IOPS but it is a pretty broad definition. I found this comment from Johnathan a few years back on answers which explains the importance of disk I/O in Splunk

https://answers.splunk.com/answers/298/can-i-run-splunk-in-a-vm-are-there-any-issues-or-tricks-i-sho...

All that said infrastructure teams usually believe (or are told by hardware vendors) they have given Splunk the best I/O available. Then when Splunk performs poorly they blame Splunk instead of their hardware. Instead of taking their (or the vendors) word for it I've been trying to come up with a good way to benchmark their systems beyond the basic dd or windows equivalent tests.

I've been putting together a Splunk TA to stress test their disk subsystems both hot and cold using FIO (https://github.com/axboe/fio)

My current tests are quite broad but was hoping to get some advice on the best settings for FIO in simulate the disk usage of a Splunk indexer. Our engineers think a good generic test would be random disk IO with 75% writes and 25% reads (see test 1 below)

Does anyone have any suggestions on some better more Splunk specific tests?

Current tests

# Random Read Writes (25% reads 75% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=random_rw_25_75 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=C\:random_rw_25_75.json

# Sequential Read Writes (25% reads 75% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=seqential_rw_25_75 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=rw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=C\:seqential_rw_25_75.json

# Random Read Writes (75% reads 25% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=random_rw_75_25 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=random_rw_75_25.json

# Sequential Read Writes (75% reads 25% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=seqential_rw_75_25 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=rw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=seqential_rw_75_25.json

# Random Reads
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=random_read --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=randread --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=random_read.json

# Sequential Reads
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=sequential_read --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=read --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=sequential_read.json

Here is a sample output from one of the tests. Note the IOPS for both read/write and the latency records at the end.

https://pastebin.com/XUnhRK24

Thoughts?

Tags (1)
0 Karma
Got questions? Get answers!

Join the Splunk Community Slack to learn, troubleshoot, and make connections with fellow Splunk practitioners in real time!

Meet up IRL or virtually!

Join Splunk User Groups to connect and learn in-person by region or remotely by topic or industry.

Get Updates on the Splunk Community!

Agent Mode Engaged! Enchaining Agentic Operations with Splunk AI Assistant 2.0

    Are you ready to transform how your team handles complex data requests? We invite you to our upcoming ...

Announcing Modern Navigation: A New Era of Splunk User Experience

We are excited to introduce the Modern Navigation feature in the Splunk Platform, available to both cloud and ...

Modernize your Splunk Apps – Introducing Python 3.13 in Splunk

We are excited to announce that the upcoming releases of Splunk Enterprise 10.2.x and Splunk Cloud Platform ...