[Pkg-ofed-devel] Infiniband performance: Maturity of kernel and tools in debian wheezy?

Wolfgang Rosner wrosner at tirnet.de
Fri Apr 3 15:36:15 UTC 2015


Hello, debianic infiniband pro's


can I consider the infiniband tools in wheezy and the kernel in 
whezy-backports as "state of the art?" Or can I expect considerable 
performance improvement by building from recent sources?


There are some HowTo's on the web saying "always use latest versions", but 
they all are older than 5 years.

Can I conclude that infiniband development has settled down, and there is no 
use in chasing the last upgrades? 

On the other hand, 
http://downloads.openfabrics.org/downloads/
shows quite higher version numbers than those I get from debian tools.



Right now I'm working my way through the Infinband Howto.
I'm trapped at the performance chapter:
http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html#ss4.9
because I think that the raw performance of ib_rdma_bw & companions is 
disappointing. Not to mention ibping, which merely can catch up with ethernet 
latency values (~ 0.120 ..0.150 ms).

I don't get closer than 55% to the theoretical throughput value.
OK, I've learned that there is some "8 over 10 bits" encoding on the wire, but 
between 55 % and 80 % there is still some gap left, I thought.

Before trying infiniband, I experimentet with teql layer 3 bonding of 6 x 
1-GBit ethernet links, which yielded 5.7 GBit/s, which is 95 % of theoretical 
max. But admittedly, not that close at PCIe bus limit.


I reconfigured my Blades to make sure the ib HCA gets PCIE x8 bandwith.
This was x4 before, which doubled throughput (as I expected).

I upgraded firmware on my HCA's , but no effect.



A parallel bidirectional RDMA bandwith test as this
for i  in 10 11 14 15 16 ; do ( ib_rdma_bw -b  192.168.130.${i} & ) ; done

yields
712+712+572+570+575 = 3141 MB/s which is ~ 25 GB/s ~ 62 % of 40 GBit

same thing unidirectional (w/o -b option) is precisely half:
 354+354+288+286+287 = 1569 = 12552 MBit / s

running the tests sequentially (without the &) 
2820 MB/sec ~ 22 GBit	bidirectional
1410 MB/sec ~ 11 GBit	unidirectional

So it does not look like a bottlneck on the blade side or on the physical 
pathway.

Blade<->Blade bandwith is somewhat lower across all setups 
(e.g 2675.37 MB/sec for sequential bidirectional tests)

Are there any significant rewards to be expected from further tuning, 
or did I hit the hardware ceiling already?


Wolfgang Rosner


=============================================

System details

"poor man's beowulf cluster" from "ebay'd old server hardware" (HP Blade 
center) and a headnode on a recent "premium consumer grade" mainboard:


wheezy backport kernel
$ uname -a Linux cruncher 3.16.0-0.bpo.4-amd64 #1 SMP Debian 
3.16.7-ckt4-3~bpo70+1 (2015-02-12) x86_64 GNU/Linux


Head node:
Sabertooth 990FX 2.0, AMD FX 8320 Eight-Core, 
debian wheezy 7.7
Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) 
(rev 20)
lspci LnkSta: Speed 2.5GT/s, Width x8,
module 'mthca0' Firmware version: 4.8.930

cluster nodes :
HP Blades BL460c G1, Dual Intel Xeon QuadCore (mixed X5355 and E5430)
debian wheezy 7.8
Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20)
lspci LnkSta: Speed 2.5GT/s, Width x8,


switch 
"MT47396 Infiniscale-III Mellanox Technologies" base port 0 lid 3 lmc 0
(HP Blade center switch)
(no clue how to read or even upgrade its firmware level)






More information about the Pkg-ofed-devel mailing list