The planning docs here - http://www.splunk.com/base/Documentation/latest/Installation/CapacityplanningforalargerSplunkdeploym... - recommend the following storage hardware :
4x300GB SAS hard disks at 10,000 rpm each in RAID 10 * capable of 800 IO operations / second (IOPS)
Can anyone clarify this a bit further please? We need more I/O information: The 800IOPS sustained - is that read or write? Random or sequential? Large or small block?
When measuring maximum IO/s per second, the bounding factor is typically seeks, which gives the same value for both reads and writes, and is only meaningful for the random case. Most testing tools will use a mix of reads and writes to attempt to exercise the device's real capabilities. Bonnie++ will do a majority of a reads and a minority of writes, which is pretty similar to Splunk's overall load in a well-used deployment, or in the case where a few searches are actually running, which is what you want to optimize for anyway.
As for large or small blocks, I'm not sure what will produce numbers closer to Splunk's behavior. Usually block size will affect throughput much more significantly, and IO/s not so much.
As for bonnie++, I typically would get the memory size of the box (not the vm) and then run
bonnie++ -d /somewhere/on/your/intended/fs -r 16384 -b
this is for a 4GB box. Essentially I'm telling bonnie++ to do 4x work than it would autotune for, to confidentaly defeat various forms of caching.
Note that as of 5.0.2, the topic referred to in this question has been substantially revised and expanded.
field 18, Random seeks.
alternatively, you can pretty print the last line of results using script including in bonnie++ e.g.:
echo "1.96,1.96,hostname,1,1365178857,192G,,,,blah,,,123456789,,," | bon_csv2html>bonnie_results.html
And look at the "Random Seeks" /sec
I'm also trying to plan for a splunk deployment and am struggling with a couple things regarding disk IO. My test box has 48GB Memory, so a 2x RAM test is 96GB, which takes a while. With bonnie++ which field should I reference for the IOPS to compare to the recommended 800 IOPs?
IOPS is mostly a factor of disk/RAID performance and typically won't use all that RAM. I also have a server with many CPUs and a large amount of RAM, and my Splunk server is very slow because due to disk I/O. I didn't have much guidance when I first set up my server, but now Splunk has some excellent docs and pointers at http://docs.splunk.com/Documentation/Splunk/5.0.2/Installation/Referencehardware
It would be awesome if someone from splunk made a few bonnie parameter examples that would simulate a few real-world examples of usage from splunk!!
No bonnie will really simulate the splunk load. However, I can try to figure out (remember) the more useful aspects of its invocation.
When measuring maximum IO/s per second, the bounding factor is typically seeks, which gives the same value for both reads and writes, and is only meaningful for the random case. Most testing tools will use a mix of reads and writes to attempt to exercise the device's real capabilities. Bonnie++ will do a majority of a reads and a minority of writes, which is pretty similar to Splunk's overall load in a well-used deployment, or in the case where a few searches are actually running, which is what you want to optimize for anyway.
As for large or small blocks, I'm not sure what will produce numbers closer to Splunk's behavior. Usually block size will affect throughput much more significantly, and IO/s not so much.
As for bonnie++, I typically would get the memory size of the box (not the vm) and then run
bonnie++ -d /somewhere/on/your/intended/fs -r 16384 -b
this is for a 4GB box. Essentially I'm telling bonnie++ to do 4x work than it would autotune for, to confidentaly defeat various forms of caching.