[Pkg-ofed-devel] lenny - openmpi problems

Yann JOBIC jobic at polytech.univ-mrs.fr
Tue Sep 15 15:19:05 UTC 2009


Guy Coates wrote:
>
>> I installed the package, and now i cannot ibping , and ibstat isn't 
>> working :
>>
>> Lilou:~# ibstat
>> ibpanic: [6594] main: stat of IB device 'mlx4_0' failed: (Device or 
>> resource busy)
>
> You will need to reboot once the kernel module package has been 
> installed. Assuming that you have done that, is there anything odd in 
> /var/log/messages /
> dmesg?
>
> Cheers,
>
> Guy
>
Maybe i loaded the wrongs modules ?

Lidia:~# lsmod | grep mlx
mlx4_ib                61632  0
ib_mad                 39336  4 ib_umad,ib_cm,ib_sa,mlx4_ib
ib_core                70656  10 
ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad
mlx4_core              97332  1 mlx4_ib

Lidia:~# lsmod | grep ib
ib_ipoib               78048  0
inet_lro               12800  1 ib_ipoib
ipv6                  288328  81 ib_ipoib
ib_umad                17576  8
ib_cm                  39208  2 ib_ipoib,rdma_cm
ib_sa                  42280  3 ib_ipoib,rdma_cm,ib_cm
ib_addr                11144  1 rdma_cm
ib_uverbs              41552  1 rdma_ucm
mlx4_ib                61632  0
ib_mad                 39336  4 ib_umad,ib_cm,ib_sa,mlx4_ib
ib_core                70656  10 
ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad
mlx4_core              97332  1 mlx4_ib
libata                165600  1 ata_generic
scsi_mod              160760  5 
sd_mod,mptsas,mptscsih,scsi_transport_sas,libata
dock                   14112  1 libata

Lidia:~# lsmod | grep rdma
rdma_ucm               15936  0
rdma_cm                34068  1 rdma_ucm
ib_cm                  39208  2 ib_ipoib,rdma_cm
iw_cm                  13704  1 rdma_cm
ib_sa                  42280  3 ib_ipoib,rdma_cm,ib_cm
ib_addr                11144  1 rdma_cm
ib_uverbs              41552  1 rdma_ucm
ib_core                70656  10 
ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad


The opensm is not loading correctly :
******************************************************************
****************** ERRORS DURING INITIALIZATION ******************
******************************************************************


Sep 15 17:09:34 621048 [519C0950] 0x01 -> osm_vendor_send: ERR 5430: 
Send p_madw = 0x8cee00 of size 256 failed -5 (Invalid argument)
Sep 15 17:09:34 621068 [519C0950] 0x01 -> __osm_sm_mad_ctrl_send_err_cb: 
ERR 3113: MAD completed in error (IB_ERROR)
Sep 15 17:09:34 621089 [519C0950] 0x01 -> SMP dump:
                                base_ver................0x1
                                mgmt_class..............0x81
                                class_ver...............0x1
                                method..................0x1 (SubnGet)
                                D bit...................0x0
                                status..................0x0
                                hop_ptr.................0x0
                                hop_count...............0x0
                                trans_id................0x1245
                                attr_id.................0x11 (NodeInfo)
                                resv....................0x0
                                attr_mod................0x0
                                m_key...................0x0000000000000000
                                dr_slid.................65535
                                dr_dlid.................65535

                                Initial path: 0
                                Return path:  0
                                Reserved:     [0][0][0][0][0][0][0]

                                00 00 00 00 00 00 00 00   00 00 00 00 00 
00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 00 
00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 00 
00 00 00

                                00 00 00 00 00 00 00 00   00 00 00 00 00 
00 00 00

Sep 15 17:09:34 621101 [519C0950] 0x01 -> vl15_send_mad: ERR 3E03: MAD 
send failed (IB_UNKNOWN_ERROR)


And in the syslog :
Sep 15 17:08:28 Lidia OpenSM[6107]: 
/var/log/opensm.0x0003ba000100c02d.log log file opened
Sep 15 17:08:28 Lidia OpenSM[6107]: OpenSM 3.2.6_20090317#012
Sep 15 17:08:28 Lidia OpenSM[6110]: 
/var/log/opensm.0x0003ba000100c02e.log log file opened
Sep 15 17:08:28 Lidia OpenSM[6110]: OpenSM 3.2.6_20090317#012
Sep 15 17:08:28 Lidia OpenSM[6107]: Entering DISCOVERING state#012
Sep 15 17:08:28 Lidia OpenSM[6110]: Entering DISCOVERING state#012
Sep 15 17:08:28 Lidia kernel: [   46.506594] ib_mad: Method 1 already in use
Sep 15 17:08:28 Lidia kernel: [   46.622598] ib_mad: Method 1 already in use
Sep 15 17:08:28 Lidia OpenSM[6107]: Exiting SM#012
Sep 15 17:08:28 Lidia OpenSM[6110]: Exiting SM#012
Sep 15 17:08:33 Lidia kernel: [   54.259857] warning: `ntpd' uses 32-bit 
capabilities (legacy support in use)
Sep 15 17:08:34 Lidia OpenSM[5071]: Entering MASTER state#012
Sep 15 17:08:34 Lidia OpenSM[5074]: Entering MASTER state#012
Sep 15 17:08:35 Lidia kernel: [   57.411566] eth0: no IPv6 routers present
Sep 15 17:08:44 Lidia kernel: [   68.509666] ib_query_port failed (-16) 
for mlx4_0
Sep 15 17:08:54 Lidia kernel: [   80.077614] Couldn't query port
Sep 15 17:08:54 Lidia kernel: [   80.077642] ib0: ib_query_gid() failed
Sep 15 17:08:55 Lidia ibstat: ibpanic: [6565] main: stat of IB device 
'mlx4_0' failed: (Device or resource busy)
Sep 15 17:09:04 Lidia kernel: [   93.364796] ib_query_port failed (-16) 
for mlx4_0
Sep 15 17:09:04 Lidia kernel: [   93.364935] ib0: ib_query_port failed
Sep 15 17:09:14 Lidia OpenSM[5071]: Errors during initialization#012
Sep 15 17:09:16 Lidia kernel: [  107.413184] ib0: ib_query_gid() failed
Sep 15 17:09:44 Lidia kernel: [  143.443054] ib0: ib_query_port failed
Sep 15 17:10:08 Lidia kernel: [  175.558610] ib0: ib_query_gid() failed
Sep 15 17:10:18 Lidia OpenSM[5074]: Errors during initialization#012

Cheers,

Yann



More information about the Pkg-ofed-devel mailing list