Deployment Architecture

Disk IO plumbing tests for Splunk Infrastructure

cameronjust
Path Finder

Hi All,

I've been constantly battling customer environments where they don't follow best practice guidelines for their infrastructure. Usually this is always when they deploy in VMWare environments. The most notable issues are assigning not enough CPU cores and not pinning them as recommended in the VMWare whitepaper.

However another common issue I have encountered recently is substandard disk IO. Splunk really just recommends minimum IOPS but it is a pretty broad definition. I found this comment from Johnathan a few years back on answers which explains the importance of disk I/O in Splunk

https://answers.splunk.com/answers/298/can-i-run-splunk-in-a-vm-are-there-any-issues-or-tricks-i-sho...

All that said infrastructure teams usually believe (or are told by hardware vendors) they have given Splunk the best I/O available. Then when Splunk performs poorly they blame Splunk instead of their hardware. Instead of taking their (or the vendors) word for it I've been trying to come up with a good way to benchmark their systems beyond the basic dd or windows equivalent tests.

I've been putting together a Splunk TA to stress test their disk subsystems both hot and cold using FIO (https://github.com/axboe/fio)

My current tests are quite broad but was hoping to get some advice on the best settings for FIO in simulate the disk usage of a Splunk indexer. Our engineers think a good generic test would be random disk IO with 75% writes and 25% reads (see test 1 below)

Does anyone have any suggestions on some better more Splunk specific tests?

Current tests

# Random Read Writes (25% reads 75% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=random_rw_25_75 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=C\:random_rw_25_75.json

# Sequential Read Writes (25% reads 75% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=seqential_rw_25_75 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=rw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=C\:seqential_rw_25_75.json

# Random Read Writes (75% reads 25% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=random_rw_75_25 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=random_rw_75_25.json

# Sequential Read Writes (75% reads 25% writes)
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=seqential_rw_75_25 --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=rw --rwmixread=25 --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=seqential_rw_75_25.json

# Random Reads
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=random_read --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=randread --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=random_read.json

# Sequential Reads
"C:\Temp\fio.exe" --runtime=60 --thread --randrepeat=1 --ioengine=windowsaio --direct=1 --gtod_reduce=1 --name=sequential_read --filename=C\:/fio-test.dat --create_on_open=0 --bs=4k --iodepth=64 --size=1G --readwrite=read --disable_lat=0 --disable_clat=0 --disable_slat=0 --output-format=json+ --output=sequential_read.json

Here is a sample output from one of the tests. Note the IOPS for both read/write and the latency records at the end.

https://pastebin.com/XUnhRK24

Thoughts?

Tags (1)
0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...