Running bonnie++ with this cmd on a RHEL 6 server with a striped FS across several LUNs from a FibreChannel SAN array:
bonnie++ -d /sys_apps_01/splunk -s 512G -u splunk -fb
The bonnie++ result is: 189 Random Seeks
BUT.... The array reports 14,000 IOPS and 1.4GB/s throughput for the duration of the test.
Why the discrepancy between the bonnie++ results and the array reporting? The array is seems to be doing a massive amount of work for the test, but the bonnie++ application is not "feeling" all the work the array is performing.
Are there any "tweaks" that need to be made to the FS or IO Scheduler? Any other HBA driver tweaks that would be recommended?
Any help would be greatly appreciated!
You should ask your storage team this question. Hopefully you've already changed ulimits
This is usually like comparing apples and oranges. A disk array produces many more I/O operations for the same instruction than a single disk would because it is using RAID levels across many spindles. This does get aggreagted back to the front-end controller in most arrays. So a single read operation might hit 100 spindles thus creating a minimum of 100 IOPs. Also bonnie++ is looking at file I/O. This means it could be doing 8K reads/writes, but based on file system, block sizes, sector alignment, etc, this could get broken down to be multiple actual I/Os at the disk level.
Most importantly, what bonnie++ calls Random Seeks per Second isn't a direct correlation to an IOP, which causes confusion because we (Splunk) say IOPs instead of Random Seeks per Second. Therefore, what makes for a successful storage platform for running hot/warm on Splunk is one that can consistently produce 800+ Random Seeks per Second using bonnie++ test with at file size of at least 2x RAM (to defeat OS caching). We have seen examples of SAN arrays that don't perform well with this test, but do perform well for Splunk, which has us working to provide a better testing harness for you to use in the near future. Stay tuned.
The array reports are FrontEnd IO reports not BackEnd IO. I am still confused how 189 random seeks would produce over 14,000 IOPS between the host and array. Even with FileSystem/LVM/SCSI/FC layers multiplying those IOs, I would not expect 75x increase in IOPS...
That's where the 2nd half of my answer comes in to play. bonnie++ Random Seeks per Sec are not equivalent to IOPs (nor do they claim them to be representative of IOPs). This is an issue I am working to resolve with our own documentation. Although the references in our docs don't use bonnie++ from what I am able to find. They actually discuss number of spindles, rotation speeds, raid level and expect IOPS based on that (which would be relatively accurate).
This is an interesting serverfault post that summarizes things well. http://serverfault.com/questions/578250/interpreting-iops-these-bonnie-and-iostat-results
Using iostat while bonnie++ is running would give you an IOPs reading. I would guess that will be more inline with what you are experiencing on the array. But it is not what Splunk means when it says "IOPS". Again, I am working to correct all of this.
Is this all bonnie++ gives? 189 random seeks? There should be more verbose output than this.
see this post for an example that I'd expect to see:
They use -qfb switch instead of -fb
Difficult to paste... Here is my attempt:
Version 1.96 Sequential Output Sequential Input Random Seeks Sequential Create Random Create Size Per Char Block Rewrite Per Char Block Num Files Create Read Delete Create Read Delete K/sec % CPU K/sec % CPU K/sec % CPU K/sec % CPU K/sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU /sec % CPU spn2stl55 512G 904424 83 654502 53 1300851 45 189.4 81 16 8052 18 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 844ms 532ms 235ms 95263us Latency 71us 258us 278us 63us 8us 36us
i guess it has to do with the difference between IOPS bonnie++ sees and IOPS the SAN array sees.
For example if the SAN array has a RAID array that is RAID0 w/200disks... 100 disks mirrored to 100 disks, and it writes 1 bit to each disk during the same second, it could report 200 IOPS, because it has theoretically written 1 bit to 200 disks and that was one input/output per second per disk... but in bonnie++ it only sees that it took 1 second to write that 1 bit thing and therefore could report 1 IOPS? hard to even explain but I feel your array is giving a total IOPS across the entire array whereas bonnie++ is given what appears to be a single disk to the OS, in IOPS.
Maybe it's worth to check other sources like http://serverfault.com/questions/517051/can-i-determine-iops-on-a-disk-array-using-bonnie or http://recoverymonkey.org/2013/02/25/beware-of-benchmarking-storage-that-does-inline-compression/
This is most likely related to the disk array, so also not forget to check the vendor for input on the topic 😉