Infiniband Quickstart Howto Guy Coates ____________________________________________________________ Table of Contents 1. Introduction 1.1 What is OFED? 2. Installing the OFED Userspace Software 2.1 Installing prebuilt packages 2.2 Building packages from source 2.2.1 Install the prerequities development packages 2.2.2 Checkout the svn tree 2.2.3 Install the upstream source (optional) 2.2.4 Build the packages. 3. OFED kernel modules 3.1 Building OFED kernel modules 4. Setting up a basic infiniband network 4.1 Upgrade your infiniband card and switch firmware 4.2 Choose a Subnet Manager 4.3 Load the kernel modules 4.4 (optional) Start opensm 4.5 Check network health 4.6 Check the extended network connectivity 4.7 testing connectivity with ibping 4.8 Testing RDMA performance 5. IP over infiniband (IPoIB) 5.1 List the network devices 5.2 IP Configuration 5.3 Connected vs Unconnected Mode 5.4 TCP tuning 6. MPI 7. SDP 7.1 Configuration 7.2 Example Using SDP with Netpipe 8. SRP 8.1 Configuration 8.2 SRP deamon configuration 8.2.1 Determine the IDs of presented devices 8.2.2 Configure srp_deamon to connect to the devices 8.3 Multipathing, LVM and formatting. 9. Building Lustre against OFED 9.1 Check Compatibility 9.2 Install the OFED packages 9.3 Install a set of kernel modules 9.4 Configure lustre 10. Network Troubleshooting 11. Further Information ______________________________________________________________________ 11.. IInnttrroodduuccttiioonn This document describes how to install and configure the OFED infiniband sotware on Debian. This document is intended to be a quickstart document on how to configure basic infiniband functionality. It is not a replacment for the details documentation provided in the ofed-docs package! 11..11.. WWhhaatt iiss OOFFEEDD?? OFED (OpenFabric's Enterprise Distribution) is the defacto infiniband software stack on linux. OFED provides a consistent set of kernel modules and userspace libraries which have been tested together. Further details of the Openfabrics Alliance and OFED can be found here http://www.openfabrics.org 22.. IInnssttaalllliinngg tthhee OOFFEEDD UUsseerrssppaaccee SSooffttwwaarree Before you can use your infiniband network you will need to install the OFED software on your infiniband clients. You can choose to use the pre-build packages on alioth, or build your own packages straight from the alioth SVN repository. 22..11.. IInnssttaalllliinngg pprreebbuuiilltt ppaacckkaaggeess Download and install the packages at https://alioth.debian.org/frs/?group_id=100311 . Packages are grouped by OFED release. Unless you know what you are doing, you should install all of the packages. Note that some OFED 1.4 packages are already in debian Lenny. You can install them from your usual repository. 22..22.. BBuuiillddiinngg ppaacckkaaggeess ffrroomm ssoouurrccee If you wish to build the OFED packages from the alioth svn repository, use the following procedure. 22..22..11.. IInnssttaallll tthhee pprreerreeqquuiittiieess ddeevveellooppmmeenntt ppaacckkaaggeess aptitude install svn-buildpackage build-essential devscripts 22..22..22.. CChheecckkoouutt tthhee ssvvnn ttrreeee svn co svn://svn.debian.org/pkg-ofed/ 22..22..33.. IInnssttaallll tthhee uuppssttrreeaamm ssoouurrccee ((ooppttiioonnaall)) The upstream source tarballs need to be available if you want to build pukka debian packages suitable for inclusion upstream. If you are simply building packages for your own use, you can ignore this step. cd pkg-ofed mkdir tarballs Populate the tarballs with the *.orig.tar.gz files available form the "upstream source" release on https://alioth.debian.org/frs/?group_id=100311 22..22..44.. BBuuiilldd tthhee ppaacckkaaggeess.. cd into the package you wish to build. eg for libibcommon, cd pkg-ofed/libibcommon Link in the upstream tarballs directory (optional) ln -s -f ../tarballs . Run svn-buildpackage from within the trunk directory. cd pkg-ofed/libibcommon/trunk svn-buildpackage -uc -us -rfakeroot The build process will generate a deb in the build-area directory. Repeat the process for the rest of the packages. Note that some packages have build dependancies on other OFED packages. The suggested build order is: libibcm libibcommon libibumad libibmad libnes libsdp dapl opensm infiniband-diags ibutils mstflint perftest qlvnictools qpert rds-tools sdpnetstat srptools tvflash ibsim ofed-docs ofa_kernel ofed 33.. OOFFEEDD kkeerrnneell mmoodduulleess You will also require a set of OFED kernel modules which match the version of the OFED userspace software you have installed. OFED kernel modules are periodically merged into the mainline kernel, and so the default debian kernel will already contain a set of OFED infiniband drivers. However, the drivers in the debian kernel may not match the OFED userspace version have installed. Consult the table below to determine what OFED version the debian kernel contains. Debian Kernel Version OFED Version <=2.6.26 1.3 >=2.6.27 1.4 If the debian kernel modules do not match the OFED version you have installed you can build a new set of modules using the ofa-kernel- source package. If your kernel already includes the correct OFED kernel modules you can skip the rest of this section. If you are in doubt, you should build a new set of modules rather than relying on the modules shipped with the kernel. 33..11.. BBuuiillddiinngg OOFFEEDD kkeerrnneell mmoodduulleess You can build new kernel modules using module-assistant. aptitude install module-assistant Ensure you have the ofa-kernel-source package installed, and then run: module-assistant prepare module-assistant clean ofa-kernel module-assistant build ofa-kernel This will create a deb containing the OFED kernel modules. The deb contains replacements for existing kernel modules and so you will need to either manually remove any infiniband modules which have already been loaded, or reboot the machine, before you can use the new mod- ules. The new kernel modules will be installed into /usr/lib//updates. They will not overwrite the original kernel modules, but the kernel module loader will pick up the modules from the updates directory in preference. You can verify that the system is using the new kernel modules by running the modinfo command. # modinfo ib_core filename: /lib/modules/2.6.22.19/updates/kernel/drivers/infiniband/core/ib_core.ko author: Roland Dreier description: core kernel InfiniBand API license: Dual BSD/GPL vermagic: 2.6.22.19 SMP mod_unload Note that if you wish to rebuild the kernel modules (eg for a new kernel version) then you must issue the module-assistant clean command before trying a new build. 44.. SSeettttiinngg uupp aa bbaassiicc iinnffiinniibbaanndd nneettwwoorrkk This sections describes how to set up a basic infiniband network and test its functionality. 44..11.. UUppggrraaddee yyoouurr iinnffiinniibbaanndd ccaarrdd aanndd sswwiittcchh ffiirrmmwwaarree Before proceeding you should ensure that the firmware in your switches and infiniband cards is at the latest release. Older firmware versions may cause interoperbility and fabric stability issues. Do not assume that just because your hardware has come fresh from the factory that it has the latest firmware on it. You should follow the documentation from your vendor as to how the firmware should be updated. 44..22.. CChhoooossee aa SSuubbnneett MMaannaaggeerr Each infiniband network requires a subnet manager. You can choose to run the OFED opensm subnet manager on one of the linux clients, or you may choose to use an embedded subnet manager running on one of the switches in your fabric. Note that not all switches come with a subnet manager; check your switch documentation. 44..33.. LLooaadd tthhee kkeerrnneell mmoodduulleess infiniband kernel modules are not loaded automatically. You should adding them to /etc/modules so that they are automatically loaded on machine bootup. You will need to include the hardware specific modules and the protocol modules. /etc/modules: # Hardware drivers # Choose the apropriate modules from # /lib/modules//updates/kernel/drivers/infiniband/hw # #mlx4_ib # Mellanox ConnectX cards #ib_mthca # some mellanox cards #iw_cxgb3 # Chelsio T3 cards #iw_nes # NetEffect cards # # Protocol modules # Common modules ib_umad ib_uverbs # IP over IB ib_ipoib # scsi over IB ib_srp # IB SDP protocol ib_sdp 44..44.. ((ooppttiioonnaall)) SSttaarrtt ooppeennssmm If you choose to use the opensm suetnet manager, edit /etc/default/opensm and add the port GUIDs of the interfaces on which you wish to start opensm. You can find the port GUIDs of your cards with the ibstat -p command: # ibstat -p 0x0002c9030002fb05 0x0002c9030002fb06 /etc/default/opensm: PORTS="0x0002c9030002fb05 0x0002c9030002fb06" Note if you want to start opensm on all ports you can use the PORTS="ALL" keyword. Start opensm: #/etc/init.d/opensm start If opensm has started correctly you should see SUBNET UP messages in the opensm logfile (/var/log/opensm..log). Mar 04 14:56:06 600685 [4580A960] 0x02 -> SUBNET UP Note that you can start opensm on multiple nodes; one node will be the active subnet manager and the others will put themselves into standby and take over if the original subnet manager dies. 44..55.. CChheecckk nneettwwoorrkk hheeaalltthh You can now check the status of the local infiniband link with the ibstat command. Connected links should be in the "LinkUp" state. The following output is from a host with a dual ported card, only one of which (port1) is connected. # ibstat CA 'mlx4_0' CA type: MT25418 Number of ports: 2 Firmware version: 2.3.0 Hardware version: a0 Node GUID: 0x0002c9030002fb04 System image GUID: 0x0002c9030002fb07 Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x0002c9030002fb05 Port 2: State: Down Physical state: Polling Rate: 10 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x02510868 Port GUID: 0x0002c9030002fb06 44..66.. CChheecckk tthhee eexxtteennddeedd nneettwwoorrkk ccoonnnneeccttiivviittyy Once the host is connected to the infiniband network you can check the health of all of the other network components with the ibhosts, ibswitches and iblinkinfo commands. ibhosts displays all of the hosts visible on the network. # ibhosts Ca : 0x0008f1040399d3d0 ports 2 "Voltaire HCA400Ex-D" Ca : 0x0008f1040399d370 ports 2 "Voltaire HCA400Ex-D" Ca : 0x0008f1040399d3fc ports 2 "Voltaire HCA400Ex-D" Ca : 0x0008f1040399d3f4 ports 2 "Voltaire HCA400Ex-D" Ca : 0x0002c9030002faf4 ports 2 "MT25408 ConnectX Mellanox Technologies" Ca : 0x0002c9030002fc0c ports 2 "MT25408 ConnectX Mellanox Technologies" Ca : 0x0002c9030002fc10 ports 2 "MT25408 ConnectX Mellanox Technologies" ibswitches will display all of the switches in the network. # ibswitches Switch : 0x0008f104004121fa ports 24 "ISR9024D-M Voltaire" enhanced port 0 lid 1 lmc 0 iblinkinfo will show the status and speed of all of the links in the network. #iblinkinfo.pl Switch 0x0008f104004121fa ISR9024D-M Voltaire: 1 1[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 2 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 2[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 13 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 3[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 4 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 4[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 26 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 5[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 27 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 6[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 24 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 7[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 28 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 8[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 25 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 9[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 31 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 10[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 32 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 11[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 33 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 12[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 29 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 13[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 30 1[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 14[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( ) 1 15[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 3 1[ ] "Voltaire HCA400Ex-D" ( ) 1 16[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 10 1[ ] "Voltaire HCA400Ex-D" ( ) 17[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( ) 18[ ] ==( 4X 2.5 Gbps Down / Polling)==> [ ] "" ( ) 1 19[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 7 2[ ] "Voltaire HCA400Ex-D" ( ) 1 20[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 6 2[ ] "Voltaire HCA400Ex-D" ( ) 1 21[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 5 2[ ] "Voltaire HCA400Ex-D" ( ) 1 22[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 21 1[ ] "Voltaire HCA400Ex-D" ( ) 1 23[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 9 2[ ] "Voltaire HCA400Ex-D" ( ) 1 24[ ] ==( 4X 5.0 Gbps Active / LinkUp)==> 8 1[ ] "Voltaire HCA400Ex-D" ( ) 44..77.. tteessttiinngg ccoonnnneeccttiivviittyy wwiitthh iibbppiinngg ibping is an infiniband equivalent to the icmp ping command. Choose a node on the fabric and run a ibping server: #ibping -S Choose another node on your network, and then ping the port GUID of the server. (ibstat on the server will list the port GUID). #ibping -G 0x0002c9030002fc1d Pong from test.example.com (Lid 13): time 0.072 ms Pong from test.example.com (Lid 13): time 0.043 ms Pong from test.example.com (Lid 13): time 0.045 ms Pong from test.example.com (Lid 13): time 0.045 ms 44..88.. TTeessttiinngg RRDDMMAA ppeerrffoorrmmaannccee You can test the latency and bandwith of a link with the ib_rdma_lat commands. To test the latency, start the server on a node: #ib_rdma_lat and then start a client on anothe node, giving it the hostname of the server. #ib_rdma_lat hostname-of-server local address: LID 0x0d QPN 0x18004a PSN 0xca58c4 RKey 0xda002824 VAddr 0x00000000509001 remote address: LID 0x02 QPN 0x7c004a PSN 0x4b4eba RKey 0x82002466 VAddr 0x00000000509001 Latency typical: 1.15193 usec Latency best : 1.13094 usec Latency worst : 5.48519 usec You can test the bandwith of the link using the ib_rdma_bw command. #ib_rdma_bw and then start a client on another node, giving it the hostname of the server. #ib_rdma_bw hostname-of-server 855: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=0 | 855: Local address: LID 0x0d, QPN 0x1c004a, PSN 0xbf60dd RKey 0xde002824 VAddr 0x002aea4092b000 855: Remote address: LID 0x02, QPN 0x004a, PSN 0xaad03c, RKey 0x86002466 VAddr 0x002b8a4e191000 855: Bandwidth peak (#0 to #955): 1486.85 MB/sec 855: Bandwidth average: 1486.47 MB/sec 855: Service Demand peak (#0 to #955): 1970 cycles/KB 855: Service Demand Avg : 1971 cycles/KB The perftest package contains a number of other similar benchmarking programs to test various aspects of your network. 55.. IIPP oovveerr iinnffiinniibbaanndd ((IIPPooIIBB)) The OFED stack allows you to run TCP/IP over your infiniband network. This is useful for running non-infiniband aware applications across your network. Several native infiniband applications also use IPoIB for host resolution (eg Lustre and SDP). 55..11.. LLiisstt tthhee nneettwwoorrkk ddeevviicceess Check that the IBoIP modules is loaded. #modprobe ib_ipoib You will now have an "ib" network interface for each of your infini- band cards. #ifconfig -a ib0 Link encap:UNSPEC HWaddr 80-06-00-48-FE-80-00-00-00-00-00-00-00-00-00-00 BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) ib1 Link encap:UNSPEC HWaddr 80-06-00-49-FE-80-00-00-00-00-00-00-00-00-00-00 BROADCAST MULTICAST MTU:2044 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) 55..22.. IIPP CCoonnffiigguurraattiioonn You can now configure the ib network devices using /etc/network/interfaces. auto ib0 iface ib0 inet static address 172.31.128.50 netmask 255.255.240.0 broadcast 172.31.143.255 Bring the network device up, as normal. ifup ib0 You can test IPoIB functionality using the usual IP networking tools (eg ping, netperf etc). 55..33.. CCoonnnneecctteedd vvss UUnnccoonnnneecctteedd MMooddee IPoIB can run over two infiniband transports, Unreliable Datagram (UD) mode or Connected mode (CM). The difference between these two modes are described in: RFC4392 - IP over InfiniBand (IPoIB) Architecture RFC4391 - Transmission of IP over InfiniBand (IPoIB) (UD mode) RFC4755 - IP over InfiniBand: Connected Mode ADDME: Pro/cons of these two methods? You can switch between these two mode at runtime with: echo datagram > /sys/class/net/ibX/mode echo connected > /sys/class/net/ibX/mode The default is datagram (UD) mode. If you with to use CM then you can add a script to /etc/network/interfaces/if-up.d to automatically set CM mode on your interfaces when they are configured. 55..44.. TTCCPP ttuunniinngg In order to obtain maximum IPoIB throughput you may need to tweak the MTU and various kernel TCP buffer and window settings. See the details in the ipoib_release_notes.txt document in the ofed-docs package. 66.. MMPPII ADDME: How to run a test MPI application 77.. SSDDPP Sockets Direct Protocol (SDP) is a infiniband network protocol which provides an RDMA accelerated alternative to TCP. OFED provides an LD_PRELOADable library (libsdp.so), which will allow code which has been written to use TCP to use the more efficient SDP protocol instead. The use of an LD_PRELOADable libary means that the program does not need to be recompiled. 77..11.. CCoonnffiigguurraattiioonn SDP used IPoIB for address resolution, so you must configure IPoIB before using SDP. You should also ensure the ib_sdp kernel module is installed. modprobe ib_sdp You can use libsdp in two ways; you can either manually LD_PRELOAD the library whilst invoking your application, or create a config file which specifies which applications will use SDP. To manually LD_PRELOAD a library, simply set the LD_PRELOAD variable before invoking your application. LD_PRELOAD=libsdp.so ./path/to/your/application ... If you which to choose which programs will use SDP you can edit /etc/sdp.conf and specify which programs, ports and addresses are eli- gible for use. See the comments in the config file for the syntax. 77..22.. EExxaammppllee UUssiinngg SSDDPP wwiitthh NNeettppiippee The following example shows how to use libsdp to make the TCP benchmarking application, netpipe, use SDP rather than TCP. NodeA is the server and NodeB is the client. IPoIB is configured on both nodes, and NodeA's IPoIB address is 10.0.0.1 Install netpipe on both nodes. aptitude install netpipe-tcp First, run the netpipe benchmark over TCP in order to obtain a baseline number. nodeA# NPtcp nodeB# NPtcp -h 10.0.0.1 Send and receive buffers are 16384 and 87380 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 2778 times --> 0.22 Mbps in 34.04 usec 1: 2 bytes 2937 times --> 0.45 Mbps in 33.65 usec 2: 3 bytes 2971 times --> 0.69 Mbps in 33.41 usec 121: 8388605 bytes 3 times --> 2951.89 Mbps in 21680.99 usec 122: 8388608 bytes 3 times --> 3008.08 Mbps in 21276.00 usec 123: 8388611 bytes 3 times --> 2941.76 Mbps in 21755.66 usec Now repeat the test, but force netpipe to use SDP rather than TCP. nodeA# LD_PRELOAD=libsdp.so NPtcp nodeB# LD_PRELOAD=libsdp.so NPtcp -h 10.0.0.1 Send and receive buffers are 16384 and 87380 bytes (A bug in Linux doubles the requested buffer sizes) Now starting the main loop 0: 1 bytes 9765 times --> 1.45 Mbps in 5.28 usec 1: 2 bytes 18946 times --> 2.80 Mbps in 5.46 usec 2: 3 bytes 18323 times --> 4.06 Mbps in 5.63 usec 121: 8388605 bytes 5 times --> 7665.51 Mbps in 8349.08 usec 122: 8388608 bytes 5 times --> 7668.62 Mbps in 8345.70 usec 123: 8388611 bytes 5 times --> 7629.04 Mbps in 8389.00 usec You should see a significant increase in performance when using SDP. 88.. SSRRPP SRP (SCSI Remote protocol or SCSI RDMA protocol) is a protocol that allows the use of SCSI devices across infiniband. If you have infiniband storage, then you can access the devices via SRP. 88..11.. CCoonnffiigguurraattiioonn Ensure that your infiniband storage is presented to the host in question. Check your storage controller documentation. Ensure that the ib_srp kernel module is loaded and that the srptools package is installed. modprobe ib_srp 88..22.. SSRRPP ddeeaammoonn ccoonnffiigguurraattiioonn srp_deamon is responisble for discovering and connecting to SRP targets. The default configuration shipped with srp_daemon is to ignore all presented devices; this is a failsafe to prevent devices from being mounted by accident on the wrong hosts. The srp_daemon config file /etc/srp_daemon.conf has a simply syntax, and is described in the srp_daemon(1) manpage. Each line in this file is a rule which can be either to allow connection or to disallow connection according to the first character in the line (a or d accordingly) and ID of the storage device. 88..22..11.. DDeetteerrmmiinnee tthhee IIDDss ooff pprreesseenntteedd ddeevviicceess You can determine the IDs of SRP devices presented to your hosts by running the ibsrpdm -c command. # ibsrpdm -c id_ext=50001ff10005052a,ioc_guid=50001ff10005052a,dgid=fe8000000000000050001ff10005052a,pkey=ffff,service_id=2a050500f11f0050 88..22..22.. CCoonnffiigguurree ssrrpp__ddeeaammoonn ttoo ccoonnnneecctt ttoo tthhee ddeevviicceess Once we have the IDs of the devices, we can add them to /etc/srp_daemon.conf. You can also specify other srp related options for the target, such as max_cmd_per_lun and Max_sect. These are storage specific; check your vendor documentation for reccomended values. # This rule allows connection to our target a id_ext=50001ff10005052a,ioc_guid=50001ff10005052a,max_cmd_per_lun=32,max_sect=65535 # This rule disallows everything else d Restart the srp_deamon and the storage target should now become visi- ble; check the kernel log to see if the disk has been detected. /etc/init.d/srptools restart In the example kernel log output the disk has been descovered as scsi device sdb. scsi 3:0:0:1: Direct-Access IBM DCS9900 5.03 PQ: 0 ANSI: 5 sd 3:0:0:1: [sdb] 1953458176 4096-byte hardware sectors (8001365 MB) sd 3:0:0:1: [sdb] Write Protect is off sd 3:0:0:1: [sdb] Mode Sense: 97 00 10 08 sd 3:0:0:1: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA sd 3:0:0:1: [sdb] 1953458176 4096-byte hardware sectors (8001365 MB) sd 3:0:0:1: [sdb] Write Protect is off sd 3:0:0:1: [sdb] Mode Sense: 97 00 10 08 sd 3:0:0:1: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA sdb:<6>scsi4 : SRP.T10:50001FF10005052A unknown partition table sd 3:0:0:1: [sdb] Attached SCSI disk sd 3:0:0:1: Attached scsi generic sg5 type 0 88..33.. MMuullttiippaatthhiinngg,, LLVVMM aanndd ffoorrmmaattttiinngg.. The newly detected SRP device can be treated as an other scsi device. If you have multiple infiniband adapters you can use multipath-tools on top of the SRP devices to protects against a network failure. If you are not using multipathed IO you can simply format the device as normal. 99.. BBuuiillddiinngg LLuussttrree aaggaaiinnsstt OOFFEEDD Lustre is a scalable cluster filesystem popular on high performance compute clusters. See http://www.lustre.org for more information. lustre can use infiniband as one of its network transports in order to increase performance. The section describes how to compile lustre against the OFED infinband stack. 99..11.. CChheecckk CCoommppaattiibbiilliittyy 99..22.. IInnssttaallll tthhee OOFFEEDD ppaacckkaaggeess 99..33.. IInnssttaallll aa sseett ooff kkeerrnneell mmoodduulleess 99..44.. CCoonnffiigguurree lluussttrree 1100.. NNeettwwoorrkk TTrroouubblleesshhoooottiinngg Diags: ibdiagnet -r Applications Lustre over IB. Example MPI application. openmpi-dev SDP 1111.. FFuurrtthheerr IInnffoorrmmaattiioonn