We are testing out different RAID configurations for our new Splunk indexers using bonnie++ and have found some unexpected results.
Even though this article claims that RAID 5 "will offer the worst performance" we found the highest random seek / sec with RAID 5 which I believe is the IOPS number that is always being quoted by Splunk users.
We tested RAID 5, 10 and 50 using 6 disks. We ran bonnie with sizes of 2x and 4x the RAM (16G) using:
bonnie++ -d /opt/bonnie_test -s 32g -qfb
bonnie++ -d /opt/bonnie_test -s 64g -qfb
Here are the average of 4 runs each:
I expected all round superior numbers with RAID 10 prior to seeing the results.
Any ideas/thoughts why RAID 5 performed so well? Or am I misinterpreting the data?
So from this data, should we go with RAID 50 for the faster seek, or RAID 10 like most Splunkers recommend with lower seek but better Sequential block I/O?
I realize all the numbers are above the recommended 800 IOPS but we'd like to select the ideal configuration.
Well, that's not what is written in the bonnie++ man pages:
"NB You can specify the size in giga-bytes or the chunk-size in kilo-bytes if you add g or k to the end of the number respectively."
One caveat. bonnie++ does not understand sizes like -s 32g. It will read the 32, and do a 32MB test. Instead use 32768 and 65536.
You could always do a test with Splunk itself.
You could try V's app:
http://splunk-base.splunk.com/apps/22339/field-perf-benchmark
or just throw a bunch of data at it from some forwarders or even just on the disk itself. A query like this would give you a decent number:
\* | eval bytes=length(_raw) | bucket span=1m _indextime | stats sum(bytes) as bytes_per_minute by _indextime | stats min(bytes_per_minute) avg(bytes_per_minute) max(bytes_per_minute)
Run it over All time.
I don't know if your test parameters have anything to do with this or not, perhaps the -f
and -b
options? These may not well mirror how Splunk works.
The performance penalty with RAID5 (vs RAID10) occurs when you have small writes (less than a full RAID stripe). It puts the controller in a situation of having to do something similar to read in the whole stripe, apply the change, recompute parity, and then rewrite the whole stripe. As long as you're writing enough data to flush a full stripe, then RAID5 can compute the parity and write it all at once.
This is one reason why Netapp gets such good performance from their WAFL + RAID4 or RAID-DP, because WAFL plus a battery-backed NVRAM does a fantastic job of coalescing writes and making sure they almost always write full RAID stripes.
Part of what you're seeing here could also be coming from battery-backed cache on the RAID controller, but usually those are too small to greatly increase the IOPS.
Because it has to update the rawdata, the tsidx, and the bloomfilter files all pretty close to in concert, Splunk seldom will have a large enough single write to do a full-stripe write with RAID5.
I am honestly kinda stumped how you measured 800 IOPS out of your slowest example there (which happens to be a RAID10 example). With 6 drives, that implies that each drive is capable of close to 287 IOPS. Or, each drive has a latency of < 3.5ms on average. This is way less than a Seagate Savvio 15K.3, which measures at 4.85ms or 206 IOPS.
What type of server / controller / disk subsystem are you using?
One thing that I found with LSI Megaraid in my own testing is the huge benefit of the battery-backed writeback cache. Having writes take effectively near-zero latency makes a huge boost (near double) in my bonnie++ tests compared to having cache in write-through mode.
The -f and -b options come from the bonnie++ example given, on the page I quoted:
http://wiki.splunk.com/Community:HardwareTuningFactors#Disk
The servers are Dell PE R610.
The RAID controller:
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
The only info I have on the disks from the spec I have is:
300GB 15K RPM SA SCSI 6Gbps 2.5in Hotplug Hard Drive (342-2240) - Quantity 6