Reporting

Bonnie++ report different from array report

trbn
New Member

Running bonnie++ with this cmd on a RHEL 6 server with a striped FS across several LUNs from a FibreChannel SAN array:
bonnie++ -d /sys_apps_01/splunk -s 512G -u splunk -fb

The bonnie++ result is: 189 Random Seeks

BUT.... The array reports 14,000 IOPS and 1.4GB/s throughput for the duration of the test.

Why the discrepancy between the bonnie++ results and the array reporting? The array is seems to be doing a massive amount of work for the test, but the bonnie++ application is not "feeling" all the work the array is performing.

Are there any "tweaks" that need to be made to the FS or IO Scheduler? Any other HBA driver tweaks that would be recommended?

Any help would be greatly appreciated!

Thanks!!!!
Aaron

0 Karma

jmantor
Path Finder

I'm basically in the same boat as the OP. How could I tune things to get more Random Reads?

0 Karma

jkat54
SplunkTrust
SplunkTrust
0 Karma

sdvorak_splunk
Splunk Employee
Splunk Employee

This is usually like comparing apples and oranges. A disk array produces many more I/O operations for the same instruction than a single disk would because it is using RAID levels across many spindles. This does get aggreagted back to the front-end controller in most arrays. So a single read operation might hit 100 spindles thus creating a minimum of 100 IOPs. Also bonnie++ is looking at file I/O. This means it could be doing 8K reads/writes, but based on file system, block sizes, sector alignment, etc, this could get broken down to be multiple actual I/Os at the disk level.
Most importantly, what bonnie++ calls Random Seeks per Second isn't a direct correlation to an IOP, which causes confusion because we (Splunk) say IOPs instead of Random Seeks per Second. Therefore, what makes for a successful storage platform for running hot/warm on Splunk is one that can consistently produce 800+ Random Seeks per Second using bonnie++ test with at file size of at least 2x RAM (to defeat OS caching). We have seen examples of SAN arrays that don't perform well with this test, but do perform well for Splunk, which has us working to provide a better testing harness for you to use in the near future. Stay tuned.

0 Karma

trbn
New Member

The array reports are FrontEnd IO reports not BackEnd IO. I am still confused how 189 random seeks would produce over 14,000 IOPS between the host and array. Even with FileSystem/LVM/SCSI/FC layers multiplying those IOs, I would not expect 75x increase in IOPS...

Aaron

0 Karma

sdvorak_splunk
Splunk Employee
Splunk Employee

That's where the 2nd half of my answer comes in to play. bonnie++ Random Seeks per Sec are not equivalent to IOPs (nor do they claim them to be representative of IOPs). This is an issue I am working to resolve with our own documentation. Although the references in our docs don't use bonnie++ from what I am able to find. They actually discuss number of spindles, rotation speeds, raid level and expect IOPS based on that (which would be relatively accurate).
This is an interesting serverfault post that summarizes things well. http://serverfault.com/questions/578250/interpreting-iops-these-bonnie-and-iostat-results
Using iostat while bonnie++ is running would give you an IOPs reading. I would guess that will be more inline with what you are experiencing on the array. But it is not what Splunk means when it says "IOPS". Again, I am working to correct all of this.

0 Karma

jkat54
SplunkTrust
SplunkTrust

Is this all bonnie++ gives? 189 random seeks? There should be more verbose output than this.

see this post for an example that I'd expect to see:

https://answers.splunk.com/answers/84337/how-can-i-use-bonnie-to-measure-iops.html

They use -qfb switch instead of -fb

0 Karma

trbn
New Member

Difficult to paste... Here is my attempt:

Version 1.96    Sequential Output   Sequential Input    Random
Seeks       Sequential Create   Random Create
Size    Per Char    Block   Rewrite Per Char    Block   Num Files   Create  Read    Delete  Create  Read    Delete
K/sec   % CPU   K/sec   % CPU   K/sec   % CPU   K/sec   % CPU   K/sec   % CPU   /sec    % CPU       /sec    % CPU   /sec    % CPU   /sec    % CPU   /sec    % CPU   /sec    % CPU   /sec    % CPU
spn2stl55   512G            904424  83  654502  53          1300851 45  189.4   81  16  8052    18  +++++   +++ +++++   +++ +++++   +++ +++++   +++ +++++   +++
Latency     844ms   532ms       235ms   95263us Latency 71us    258us   278us   63us    8us 36us
0 Karma

jkat54
SplunkTrust
SplunkTrust

i guess it has to do with the difference between IOPS bonnie++ sees and IOPS the SAN array sees.

For example if the SAN array has a RAID array that is RAID0 w/200disks... 100 disks mirrored to 100 disks, and it writes 1 bit to each disk during the same second, it could report 200 IOPS, because it has theoretically written 1 bit to 200 disks and that was one input/output per second per disk... but in bonnie++ it only sees that it took 1 second to write that 1 bit thing and therefore could report 1 IOPS? hard to even explain but I feel your array is giving a total IOPS across the entire array whereas bonnie++ is given what appears to be a single disk to the OS, in IOPS.

0 Karma

trbn
New Member

These are front end IOPS and BW numbers from the array for the specific server, not the array wide IOPS or backend IOPS.

Aaron

0 Karma

MuS
SplunkTrust
SplunkTrust

Maybe it's worth to check other sources like http://serverfault.com/questions/517051/can-i-determine-iops-on-a-disk-array-using-bonnie or http://recoverymonkey.org/2013/02/25/beware-of-benchmarking-storage-that-does-inline-compression/
This is most likely related to the disk array, so also not forget to check the vendor for input on the topic 😉

cheers, MuS

0 Karma

jkat54
SplunkTrust
SplunkTrust

no clue then 😉

0 Karma