I've been constantly battling customer environments where they don't follow best practice guidelines for their infrastructure. Usually this is always when they deploy in VMWare environments. The most notable issues are assigning not enough CPU cores and not pinning them as recommended in the VMWare whitepaper.
However another common issue I have encountered recently is substandard disk IO. Splunk really just recommends minimum IOPS but it is a pretty broad definition. I found this comment from Johnathan a few years back on answers which explains the importance of disk I/O in Splunk
All that said infrastructure teams usually believe (or are told by hardware vendors) they have given Splunk the best I/O available. Then when Splunk performs poorly they blame Splunk instead of their hardware. Instead of taking their (or the vendors) word for it I've been trying to come up with a good way to benchmark their systems beyond the basic dd or windows equivalent tests.
My current tests are quite broad but was hoping to get some advice on the best settings for FIO in simulate the disk usage of a Splunk indexer. Our engineers think a good generic test would be random disk IO with 75% writes and 25% reads (see test 1 below)
Does anyone have any suggestions on some better more Splunk specific tests?