[Pkg-ofed-commits] r340 - /trunk/howto/infiniband-howto.sgml

gmpc-guest at alioth.debian.org gmpc-guest at alioth.debian.org
Mon Jun 1 12:35:35 UTC 2009


Author: gmpc-guest
Date: Mon Jun  1 12:05:17 2009
New Revision: 340

URL: http://svn.debian.org/wsvn/pkg-ofed/?sc=1&rev=340
Log:
Add IB howto

Added:
    trunk/howto/infiniband-howto.sgml

Added: trunk/howto/infiniband-howto.sgml
URL: http://svn.debian.org/wsvn/pkg-ofed/trunk/howto/infiniband-howto.sgml?rev=340&op=file
==============================================================================
--- trunk/howto/infiniband-howto.sgml (added)
+++ trunk/howto/infiniband-howto.sgml Mon Jun  1 12:05:17 2009
@@ -1,0 +1,701 @@
+<!doctype linuxdoc system>
+<article>
+<title>Infiniband Howto
+<author>Guy Coates 
+
+<toc>
+
+<sect>Introduction
+<p>
+This document describes how to install and configure the OFED infiniband sotware on Debian. This document is intended
+to be a quickstart document on how to configure an Infiniband network as quickly as possible. It is not a replacment
+for the details documentation provided in the ofed-docs package!
+
+<sect1>What is OFED?
+<p>
+OFED (OpenFabric's Enterprise Distribution) is the defacto Infiniband software stack on linux. OFED 
+provides a consistent set of kernel modules and userspace libraries which have been tested together.
+
+Further details of the Openfabrics Alliance and OFED can be found here <url url="http://www.openfabrics.org/" 
+name="http://www.openfabrics.org">
+
+
+<sect>Installing the OFED Software
+<p>
+Before you can use your infiniband network you will need to install the OFED software on your infiniband client machines.
+You can choose to use the pre-build packages on alioth, or build your own packages straight from the alioth SVN repository.
+<sect1>Installing prebuilt packages
+<p>
+Download and install the packages at <url url="https://alioth.debian.org/frs/?group_id=100311" 
+name ="https://alioth.debian.org/frs/?group_id=100311">. Packages are grouped by OFED release. 
+Unless you know what you are doing, you should install all of the packages. Note that some OFED 1.4 packages are already
+in debian Lenny. You can install them from your usual repository.
+
+
+<sect1>Building packages from source
+<p>
+If you wish to build the OFED packages from the alioth svn repository, use the following procedure.
+
+<sect2>Install the prerequities development packages
+<p>
+<tscreen>
+<verb>
+aptitude install svn-buildpackage build-essential devscripts
+</verb>
+</tscreen>
+<sect2> Checkout the svn tree
+<p>
+<tscreen>
+svn co svn://svn.debian.org/pkg-ofed/
+</tscreen>
+<sect2>Install the upstream source (optional)
+<p>
+The upstream source tarballs need to be available if you
+want to build pukka debian packages suitable for inclusion
+upstream. If you are simply building packages for your own use,
+you can ignore this step.
+<tscreen>
+<verb>
+cd pkg-ofed
+mkdir tarballs
+</verb>
+</tscreen>
+
+Populate the tarballs with the *.orig.tar.gz files available form 
+the "upstream source" release on <url url="https://alioth.debian.org/frs/?group_id=100311" name ="https://alioth.debian.org/frs/?group_id=100311">
+
+<sect2> Build the packages.
+<p>
+cd into the package you wish to build. eg for libibcommon,
+<tscreen>
+ cd pkg-ofed/libibcommon
+</tscreen>
+Link in the upstream tarballs directory (optional)
+<tscreen>
+ ln -s -f ../tarballs .
+</tscreen>
+Run svn-buildpackage from within the trunk directory.
+<tscreen><verb>
+ cd pkg-ofed/libibcommon/trunk
+ svn-buildpackage -uc -us -rfakeroot 
+</verb>
+</tscreen>
+The build process will generate a deb in the build-area directory. 
+
+Repeat the process for the rest of the packages. Note that some packages have build dependancies on other OFED packages. The suggested build order is:
+<tscreen>
+<verb>
+ libibcm
+ libibcommon
+ libibumad
+ libibmad
+ libnes
+ libsdp
+ dapl
+ opensm
+ infiniband-diags
+ ibutils
+ mstflint
+ perftest
+ qlvnictools
+ qpert
+ rds-tools
+ sdpnetstat
+ srptools
+ tvflash
+ ibsim
+ ofed-docs
+ ofa_kernel
+ ofed
+</verb>
+</tscreen>
+
+
+
+<sect>Install the kernel modules
+<p>
+You now need to build a set of OFED kernel modules which match the version of the OFED software you have installed.
+
+The debian kernel contains a set of OFED infiniband drivers, but they may not match the OFED userspace version have installed.
+Consult the table below to determine what OFED version the debian kernel contains. 
+
+<tscreen>
+<verb>
+Debian Kernel Version      OFED Version
+<=2.6.26                       1.3
+>=2.6.27                       1.4
+</verb>
+</tscreen>
+
+
+If the debian kernel modules are the incorrect version, you can build a new set of modules using the ofa-kernel-source package.
+If your kernel already includes the correct OFED kernel modules you can skip the rest of this section. If you are in doubt, you should
+build a new set of modules rather than relying on the modules shipped with the kernel.
+
+<sect1>Building new kernel modules
+<p>
+You can build new kernel modules using module-assistant.
+<tscreen>
+<verb>
+aptitude install module-assistant
+</verb>
+</tscreen>
+
+Ensure you have the ofa-kernel-source package installed, and then run:
+<tscreen>
+ <verb>
+ module-assistant prepare
+ module-assistant clean ofa-kernel
+ module-assistant build ofa-kernel
+</verb>
+</tscreen>
+This will create a deb which you can then install. As the deb contains replacements for existing kernel modules you will need to either manually remove 
+any infiniband modules which have already been loaded, or reboot the machine, before you can use the new modules. 
+
+The new kernel modules will be installed into /usr/lib/&lt;kernel-version&gt/updates. They will not overwrite the original kernel modules, but the module
+loader will pick up the modules from the updates directory in preference. You can verify that the system is using the new kernel modules by running the 
+modinfo command.
+
+<tscreen>
+<verb>
+# modinfo ib_core
+filename:       /lib/modules/2.6.22.19/updates/kernel/drivers/infiniband/core/ib_core.ko
+author:         Roland Dreier
+description:    core kernel InfiniBand API
+license:        Dual BSD/GPL
+vermagic:       2.6.22.19 SMP mod_unload 
+</verb>
+</tscreen>
+
+
+Note that if you wish to rebuild the kernel modules (eg for a new kernel version) then you must issue
+the module-assistant clean command before trying a new build.
+
+
+<sect>Setting up a basic infiniband network   
+<p>
+This sections describes how to set up a basic infiniband network and test its functionality.
+
+<sect1>Upgrade your Infiniband card and switch firmware
+<p>
+Before proceeding you should ensure that the firmware in your switches and infiniband cards is at the latest release. 
+Older firmware versions may cause interoperbility and fabric stability issues. Do not assume that just because your 
+hardware has come fresh from the factory that it has the latest firmware on it. 
+
+You should follow the documentation from your vendor as to how the firmware should be updated.
+
+<sect1>Physically Connect the network
+<p>
+Connect up to your hosts and switches.
+
+<sect1>Choose a Subnet Manager
+<p>
+Each infiniband network requires a subnet manager.  You can choose to run the OFED opensm subnet manager on one of the
+linux clients, or you may choose to use an embedded subnet manager running on one of the switches in your fabric. Note
+that not all switches come with a subnet manager; check your switch documentation.
+
+
+<sect1>Load the kernel modules
+<p>
+Infiniband kernel modules are not loaded automatically. You should  adding them to /etc/modules so that they are automatically loaded on machine
+bootup. You will need to include the hardware specific modules and the protocol modules.
+
+
+/etc/modules:
+<verb>
+# Hardware drivers
+# Choose the apropriate modules from
+# /lib/modules/&lt;kernel-version&gt/updates/kernel/drivers/infiniband/hw
+#
+#mlx4_ib  # Mellanox ConnectX cards
+#ib_mthca # some mellanox cards
+#iw_cxgb3 # Chelsio T3 cards
+#iw_nes # NetEffect cards
+#
+# Protocol modules
+# Common modules
+ib_umad
+ib_uverbs
+# IP over IB
+ib_ipoib
+# scsi over IB 
+ib_srp
+# IB SDP protocol
+ib_sdp
+</verb>
+
+
+<sect1>(optional) Start opensm
+<p>
+If you are going to use the opensm suetnet manager, edit /etc/default/opensm and add the port 
+GUIDs of the interfaces on which you wish to start opensm. 
+
+You can find the port GUIDs of your cards with the ibstat -p command:
+<tscreen>
+<verb>
+# ibstat -p
+0x0002c9030002fb05
+0x0002c9030002fb06
+</verb>
+</tscreen>
+
+/etc/default/opensm:
+<tscreen>
+<verb>
+PORTS="0x0002c9030002fb05 0x0002c9030002fb06"
+</verb>
+</tscreen>
+
+Note if you want to start opensm on all ports you can use the PORTS="ALL" keyword.
+
+Start opensm:
+
+<verb>
+#/etc/init.d/opensm start
+</verb>
+
+If opensm has started correctly you should see SUBNET UP messages in the opensm logfile (/var/log/opensm.&lt;PORTID&gt;.log).
+
+<verb>
+Mar 04 14:56:06 600685 [4580A960] 0x02 -> SUBNET UP
+</verb>
+
+Note that you can start opensm on multiple nodes; one node will be the active subnet manager and the others will put themselves into standby.
+
+
+<sect1>Check network health
+<p>
+You can now check the status of the local IB link with the ibstat command.  Connected links should be in the "LinkUp" state. The following
+output is from a dual ported card, only one of which (port1) is connected.
+
+<tscreen><verb>
+# ibstat
+CA 'mlx4_0'
+        CA type: MT25418
+        Number of ports: 2
+        Firmware version: 2.3.0
+        Hardware version: a0
+        Node GUID: 0x0002c9030002fb04
+        System image GUID: 0x0002c9030002fb07
+        Port 1:
+                State: Active
+                Physical state: LinkUp
+                Rate: 20
+                Base lid: 2
+                LMC: 0
+                SM lid: 1
+                Capability mask: 0x02510868
+                Port GUID: 0x0002c9030002fb05
+        Port 2:
+                State: Down
+                Physical state: Polling
+                Rate: 10
+                Base lid: 0
+                LMC: 0
+                SM lid: 0
+                Capability mask: 0x02510868
+                Port GUID: 0x0002c9030002fb06
+</verb></tscreen>
+
+<sect1>Check the extended network connectivity
+<p>
+Once the host is connected to the infiniband network you can check the health of all of the other network components with the ibhosts, ibswitches and iblinkinfo commands.
+
+ibhosts displays all of the hosts visible on the network.
+
+<tscreen><verb>
+# ibhosts
+Ca      : 0x0008f1040399d3d0 ports 2 "Voltaire HCA400Ex-D"
+Ca      : 0x0008f1040399d370 ports 2 "Voltaire HCA400Ex-D"
+Ca      : 0x0008f1040399d3fc ports 2 "Voltaire HCA400Ex-D"
+Ca      : 0x0008f1040399d3f4 ports 2 "Voltaire HCA400Ex-D"
+Ca      : 0x0002c9030002faf4 ports 2 "MT25408 ConnectX Mellanox Technologies"
+Ca      : 0x0002c9030002fc0c ports 2 "MT25408 ConnectX Mellanox Technologies"
+Ca      : 0x0002c9030002fc10 ports 2 "MT25408 ConnectX Mellanox Technologies"
+</verb></tscreen>
+
+ibswitches will display all of the switches in the network.
+<tscreen><verb>
+# ibswitches
+Switch  : 0x0008f104004121fa ports 24 "ISR9024D-M Voltaire" enhanced port 0 lid 1 lmc 0
+</verb></tscreen>
+
+iblinkinfo will show the status and speed of all of the links in the network.
+<tscreen><verb>
+#iblinkinfo.pl 
+Switch 0x0008f104004121fa ISR9024D-M Voltaire:
+      1    1[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       2    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    2[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      13    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    3[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       4    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    4[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      26    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    5[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      27    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    6[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      24    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    7[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      28    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    8[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      25    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1    9[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      31    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1   10[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      32    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1   11[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      33    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1   12[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      29    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+      1   13[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      30    1[  ] "MT25408 ConnectX Mellanox Technologies" (  )
+          14[  ]  ==( 4X 2.5 Gbps   Down /  Polling)==>             [  ] "" (  )
+      1   15[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       3    1[  ] "Voltaire HCA400Ex-D" (  )
+      1   16[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      10    1[  ] "Voltaire HCA400Ex-D" (  )
+          17[  ]  ==( 4X 2.5 Gbps   Down /  Polling)==>             [  ] "" (  )
+          18[  ]  ==( 4X 2.5 Gbps   Down /  Polling)==>             [  ] "" (  )
+      1   19[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       7    2[  ] "Voltaire HCA400Ex-D" (  )
+      1   20[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       6    2[  ] "Voltaire HCA400Ex-D" (  )
+      1   21[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       5    2[  ] "Voltaire HCA400Ex-D" (  )
+      1   22[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>      21    1[  ] "Voltaire HCA400Ex-D" (  )
+      1   23[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       9    2[  ] "Voltaire HCA400Ex-D" (  )
+      1   24[  ]  ==( 4X 5.0 Gbps Active /   LinkUp)==>       8    1[  ] "Voltaire HCA400Ex-D" (  )
+</verb></tscreen>
+
+<sect1>testing connectivity with ibping
+<p>
+ibping is an infiniband equivalent to the icmp ping command. Choose a node on  the fabric and run a ibping server:
+<tscreen>
+#ibping -S
+</tscreen>
+
+Choose another node on your network, and then ping the port GUID of the server. (ibstat on the server will list the port GUID).
+
+<tscreen>
+<verb>
+#ibping -G 0x0002c9030002fc1d
+Pong from test.example.com (Lid 13): time 0.072 ms
+Pong from test.example.com (Lid 13): time 0.043 ms
+Pong from test.example.com (Lid 13): time 0.045 ms
+Pong from test.example.com (Lid 13): time 0.045 ms
+</verb>
+</tscreen>
+
+<sect1>Testing RDMA performance
+<p>
+
+You can test the latency and bandwith of a link with the ib_rdma_lat commands.
+
+To test the latency, start the server on a node:
+<tscreen>
+#ib_rdma_lat
+</tscreen>
+and then start a client on anothe node, giving it the hostname of the server.
+<tscreen>
+<verb>
+#ib_rdma_lat  hostname-of-server
+   local address: LID 0x0d QPN 0x18004a PSN 0xca58c4 RKey 0xda002824 VAddr 0x00000000509001
+  remote address: LID 0x02 QPN 0x7c004a PSN 0x4b4eba RKey 0x82002466 VAddr 0x00000000509001
+Latency typical: 1.15193 usec
+Latency best   : 1.13094 usec
+Latency worst  : 5.48519 usec
+</verb>
+</tscreen>
+
+You can test the bandwith of the link using the ib_rdma_bw command.
+<tscreen>
+#ib_rdma_bw
+</tscreen>
+and then start a client on another node, giving it the hostname of the server.
+<tscreen>
+<verb>
+#ib_rdma_bw  hostname-of-server
+855: | port=18515 | ib_port=1 | size=65536 | tx_depth=100 | iters=1000 | duplex=0 | cma=0 |
+855: Local address:  LID 0x0d, QPN 0x1c004a, PSN 0xbf60dd RKey 0xde002824 VAddr 0x002aea4092b000
+855: Remote address: LID 0x02, QPN 0x004a, PSN 0xaad03c, RKey 0x86002466 VAddr 0x002b8a4e191000
+
+
+855: Bandwidth peak (#0 to #955): 1486.85 MB/sec
+855: Bandwidth average: 1486.47 MB/sec
+855: Service Demand peak (#0 to #955): 1970 cycles/KB
+855: Service Demand Avg  : 1971 cycles/KB
+
+</verb>
+</tscreen>
+
+The perftest package contains a number of other similar benchmarking programs to test various aspects of your network.
+
+
+<sect>IP over Infiniband (IPoIB)
+<p>
+The OFED stack allows you to run TCP/IP over your infiniband network, allowing you to run non-infiniband aware applications across
+your network. Several native infiniband applications also use IPoIB for host resolution (eg Lustre and SDP).
+
+<sect1>List the network devices
+<p>
+Check that the IBoIP modules is loaded.
+
+<tscreen>
+#modprobe ib_ipoib 
+</tscreen>
+You will now have an "ib" network interface for each of your infiniband cards.
+<tscreen>
+<verb>
+#ifconfig -a
+
+&lt;snip&gt;
+ib0       Link encap:UNSPEC  HWaddr 80-06-00-48-FE-80-00-00-00-00-00-00-00-00-00-00  
+          BROADCAST MULTICAST  MTU:2044  Metric:1
+          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
+          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
+          collisions:0 txqueuelen:256 
+          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
+
+ib1       Link encap:UNSPEC  HWaddr 80-06-00-49-FE-80-00-00-00-00-00-00-00-00-00-00  
+          BROADCAST MULTICAST  MTU:2044  Metric:1
+          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
+          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
+          collisions:0 txqueuelen:256 
+          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
+&lt;snip&gt;
+</verb>
+</tscreen>
+
+<sect1>IP Configuration
+<p>
+You can now configure the ib network devices using /etc/network/interfaces.
+<tscreen>
+<verb>
+auto ib0
+iface ib0 inet static
+  address 172.31.128.50
+  netmask 255.255.240.0
+  broadcast 172.31.143.255
+</verb>
+</tscreen>
+Bring the network device up, as normal.
+<tscreen>
+ifup ib0
+</tscreen>
+
+<sect1>Connected vs Unconnected Mode
+<p>
+IPoIB can run over two infiniband transports, Unreliable Datagram (UD) mode or Connected mode (CM). The difference between
+these two modes are described in:
+<verb>
+RFC4392 - IP over InfiniBand (IPoIB) Architecture
+RFC4391 - Transmission of IP over InfiniBand (IPoIB) (UD mode)
+RFC4755 - IP over InfiniBand: Connected Mode
+</verb>
+ADDME: Pro/cons of these two methods?
+
+You can switch between these two mode at runtime with:
+<tscreen>
+<verb>  
+ echo datagram > /sys/class/net/ibX/mode 
+ echo connected > /sys/class/net/ibX/mode
+</verb>
+</tscreen>
+
+The default is datagram (UD) mode. If you with to use CM then you can add a  script to /etc/network/interfaces/if-up.d to
+automatically set CM mode on your interfaces when they are configured.
+
+
+<sect1>TCP tuning
+<p>
+In order to obtain maximum IPoIB throughput you may need to tweak the MTU and various kernel TCP buffer and window settings. 
+See the details in the ipoib_release_notes.txt document in the ofed-docs package.
+
+
+<sect>MPI
+<p>
+ADDME: How to run a test MPI application
+rdma_ucm
+
+
+ 
+mpirun --mca btl_openib_verbose 1 --mca btl ^tcp -n 2 -hostfile ~/hostfile /nfs/acari/gmpc/work/infiniband/scripts/hello
+
+ chmod a+rw /dev/infiniband/rdma_cm
+
+<sect>SDP
+<p>
+Sockets Direct Protocol (SDP) is a network protocol which provides an RDMA accelerated
+alternative to TCP over infiniband networks. OFED provides an LD_PRELOADable library 
+(libsdp.so) which allows programs which use TCP to use the more efficient SDP protocol instead.  
+The use of an LD_PRELOADable libary means that the switch in protocol is transparent, 
+and does not require the application to be recompiled.
+
+
+<sect1>Configuration
+<p>
+SDP used IPoIB for address resolution, so you must configure IPoIB before using SDP. 
+
+You should also ensure the ib_sdp kernel module is installed.
+<verb>
+modprobe ib_sdp
+</verb>
+
+
+You can use libsdp in two ways; you can either manually LD_PRELOAD the library whilst invoking your application, or
+create a config file which specifies which applications will use SDP.
+
+To manually LD_PRELOAD a library, simply set the LD_PRELOAD variable before invoking your application.
+<verb>
+LD_PRELOAD=libsdp.so ./path/to/your/application ...
+</verb>
+If you which to choose which programs will use SDP you can edit /etc/sdp.conf and specify which programs, ports and
+addresses are eligible for use.
+
+
+<sect1>Example Using SDP with Netpipe
+<p>
+The following example shows how to use libsdp to make the TCP benchmarking application, netpipe, use SDP rather than TCP.
+NodeA is the server and NodeB is the client. IPoIB is configured on both nodes, and NodeA's IPoIB address is 10.0.0.1
+
+Install netpipe on both nodes.
+<verb>
+aptitude install netpipe-tcp
+</verb>
+
+First, run the netpipe benchmark over TCP in order to obtain a baseline number.
+
+<tscreen>
+<verb>
+nodeA# NPtcp
+nodeB# NPtcp -h 10.0.0.1
+Send and receive buffers are 16384 and 87380 bytes
+(A bug in Linux doubles the requested buffer sizes)
+Now starting the main loop
+  0:       1 bytes   2778 times -->      0.22 Mbps in      34.04 usec
+  1:       2 bytes   2937 times -->      0.45 Mbps in      33.65 usec
+  2:       3 bytes   2971 times -->      0.69 Mbps in      33.41 usec
+&lt;snip&gt;
+121: 8388605 bytes      3 times -->   2951.89 Mbps in   21680.99 usec
+122: 8388608 bytes      3 times -->   3008.08 Mbps in   21276.00 usec
+123: 8388611 bytes      3 times -->   2941.76 Mbps in   21755.66 usec
+</verb>
+</tscreen>
+
+Now repeat the test, but force netpipe to use SDP rather than TCP.
+
+<tscreen>
+<verb>
+nodeA# LD_PRELOAD=libsdp.so NPtcp 
+nodeB# LD_PRELOAD=libsdp.so  NPtcp -h 10.0.0.1
+Send and receive buffers are 16384 and 87380 bytes
+(A bug in Linux doubles the requested buffer sizes)
+Now starting the main loop
+  0:       1 bytes   9765 times -->      1.45 Mbps in       5.28 usec
+  1:       2 bytes  18946 times -->      2.80 Mbps in       5.46 usec
+  2:       3 bytes  18323 times -->      4.06 Mbps in       5.63 usec
+&lt;snip&gt;
+121: 8388605 bytes      5 times -->   7665.51 Mbps in    8349.08 usec
+122: 8388608 bytes      5 times -->   7668.62 Mbps in    8345.70 usec
+123: 8388611 bytes      5 times -->   7629.04 Mbps in    8389.00 usec
+</verb>
+</tscreen>
+You should see a significant increase in performance when using SDP.
+
+<sect>SRP
+<p>
+SRP (SCSI Remote protocol or SCSI RDMA protocol) is a protocol that allows the use of SCSI devices across
+infiniband. If you have infiniband storage, then you can access the devices via SRP.
+<sect1>Configuration
+<p>
+Ensure that your infiniband storage is presented to the host in question. Check your storage controller documentation.
+Ensure that the ib_srp kernel module is loaded and that the srptools package is installed.
+
+<tscreen>
+modprobe ib_srp
+</tscreen>
+
+<sect1>SRP deamon configuration
+<p>
+srp_deamon is responisble for discovering and connecting to SRP targets. The default configuration shipped with srp_daemon is to ignore all presented
+devices; this is a failsafe to prevent devices from being mounted by accident on the wrong hosts.
+
+The srp_daemon config file /etc/srp_daemon.conf has a simply syntax, and is described in the  srp_daemon(1) manpage. Each line in this file is a rule which can be either 
+to allow connection or to disallow connection according to the first character in the line (a or d accordingly) and ID of the storage device.
+
+<sect2>Determine the IDs of presented devices
+<p>
+You can determine the IDs of SRP devices presented to your hosts by running the ibsrpdm -c command.
+<tscreen>
+<verb>
+# ibsrpdm -c
+id_ext=50001ff10005052a,ioc_guid=50001ff10005052a,dgid=fe8000000000000050001ff10005052a,pkey=ffff,service_id=2a050500f11f0050
+</verb>
+</tscreen>
+
+<sect2>Configure srp_deamon to connect to the devices
+<p>
+Once we have the IDs of the devices, we can add them to  /etc/srp_daemon.conf. You can also specify other srp related
+options for the target, such as max_cmd_per_lun and Max_sect.  These are storage specific; check your vendor documentation 
+for reccomended values.
+<tscreen>
+<verb>
+# This rule allows connection to our target
+a id_ext=50001ff10005052a,ioc_guid=50001ff10005052a,max_cmd_per_lun=32,max_sect=65535
+# This rule disallows everything else
+d
+</verb>
+</tscreen>
+Restart the srp_deamon and the storage target should now become visible;  check the kernel log to see if the disk has been detected.
+
+
+<verb>
+/etc/init.d/srptools restart
+</verb>
+
+In the example kernel log output the disk has been descovered as scsi device sdb.
+<tscreen>
+<verb>
+scsi 3:0:0:1: Direct-Access     IBM      DCS9900          5.03 PQ: 0 ANSI: 5
+sd 3:0:0:1: [sdb] 1953458176 4096-byte hardware sectors (8001365 MB)
+sd 3:0:0:1: [sdb] Write Protect is off
+sd 3:0:0:1: [sdb] Mode Sense: 97 00 10 08
+sd 3:0:0:1: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
+sd 3:0:0:1: [sdb] 1953458176 4096-byte hardware sectors (8001365 MB)
+sd 3:0:0:1: [sdb] Write Protect is off
+sd 3:0:0:1: [sdb] Mode Sense: 97 00 10 08
+sd 3:0:0:1: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
+ sdb:<6>scsi4 : SRP.T10:50001FF10005052A
+ unknown partition table
+sd 3:0:0:1: [sdb] Attached SCSI disk
+sd 3:0:0:1: Attached scsi generic sg5 type 0
+</verb>
+</tscreen>
+
+<sect1>Multipathing, LVM and formatting.
+<p>
+The newly detected SRP device can be treated as an other scsi device. If you have multiple infiniband adapters you can use multipath-tools 
+on top of the SRP devices to protects against a network failure.  If you are not using multipathed IO you can simply format the device as normal.
+
+<sect>Building Lustre against OFED
+<p>
+Lustre is a scalable cluster filesystem popular on high performance compute clusters. See <url url="http://www.lustre.org" name="http://www.lustre.org">
+for more information. lustre can use infiniband as one of its network transports in order to increase performance. The section describes how to compile lustre 
+against the OFED infinband stack.
+<sect1>Check Compatibility
+<p>
+<sect1>Install the OFED packages
+<p>
+<sect1>Install a set of kernel modules
+<p>
+<sect1>Configure lustre
+<p>
+
+
+
+<sect>Network Troubleshooting
+<p>
+Diags:
+ibdiagnet -r
+
+
+Applications
+
+ Lustre over IB.
+
+ Example MPI application.
+
+
+ openmpi-dev
+
+
+
+SDP
+
+<sect>Further Information
+<p>
+</article>
+




More information about the Pkg-ofed-commits mailing list