[Pkg-ofed-devel] lenny - openmpi problems
Yann JOBIC
jobic at polytech.univ-mrs.fr
Tue Sep 15 15:19:05 UTC 2009
Guy Coates wrote:
>
>> I installed the package, and now i cannot ibping , and ibstat isn't
>> working :
>>
>> Lilou:~# ibstat
>> ibpanic: [6594] main: stat of IB device 'mlx4_0' failed: (Device or
>> resource busy)
>
> You will need to reboot once the kernel module package has been
> installed. Assuming that you have done that, is there anything odd in
> /var/log/messages /
> dmesg?
>
> Cheers,
>
> Guy
>
Maybe i loaded the wrongs modules ?
Lidia:~# lsmod | grep mlx
mlx4_ib 61632 0
ib_mad 39336 4 ib_umad,ib_cm,ib_sa,mlx4_ib
ib_core 70656 10
ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad
mlx4_core 97332 1 mlx4_ib
Lidia:~# lsmod | grep ib
ib_ipoib 78048 0
inet_lro 12800 1 ib_ipoib
ipv6 288328 81 ib_ipoib
ib_umad 17576 8
ib_cm 39208 2 ib_ipoib,rdma_cm
ib_sa 42280 3 ib_ipoib,rdma_cm,ib_cm
ib_addr 11144 1 rdma_cm
ib_uverbs 41552 1 rdma_ucm
mlx4_ib 61632 0
ib_mad 39336 4 ib_umad,ib_cm,ib_sa,mlx4_ib
ib_core 70656 10
ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad
mlx4_core 97332 1 mlx4_ib
libata 165600 1 ata_generic
scsi_mod 160760 5
sd_mod,mptsas,mptscsih,scsi_transport_sas,libata
dock 14112 1 libata
Lidia:~# lsmod | grep rdma
rdma_ucm 15936 0
rdma_cm 34068 1 rdma_ucm
ib_cm 39208 2 ib_ipoib,rdma_cm
iw_cm 13704 1 rdma_cm
ib_sa 42280 3 ib_ipoib,rdma_cm,ib_cm
ib_addr 11144 1 rdma_cm
ib_uverbs 41552 1 rdma_ucm
ib_core 70656 10
ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad
The opensm is not loading correctly :
******************************************************************
****************** ERRORS DURING INITIALIZATION ******************
******************************************************************
Sep 15 17:09:34 621048 [519C0950] 0x01 -> osm_vendor_send: ERR 5430:
Send p_madw = 0x8cee00 of size 256 failed -5 (Invalid argument)
Sep 15 17:09:34 621068 [519C0950] 0x01 -> __osm_sm_mad_ctrl_send_err_cb:
ERR 3113: MAD completed in error (IB_ERROR)
Sep 15 17:09:34 621089 [519C0950] 0x01 -> SMP dump:
base_ver................0x1
mgmt_class..............0x81
class_ver...............0x1
method..................0x1 (SubnGet)
D bit...................0x0
status..................0x0
hop_ptr.................0x0
hop_count...............0x0
trans_id................0x1245
attr_id.................0x11 (NodeInfo)
resv....................0x0
attr_mod................0x0
m_key...................0x0000000000000000
dr_slid.................65535
dr_dlid.................65535
Initial path: 0
Return path: 0
Reserved: [0][0][0][0][0][0][0]
00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00
Sep 15 17:09:34 621101 [519C0950] 0x01 -> vl15_send_mad: ERR 3E03: MAD
send failed (IB_UNKNOWN_ERROR)
And in the syslog :
Sep 15 17:08:28 Lidia OpenSM[6107]:
/var/log/opensm.0x0003ba000100c02d.log log file opened
Sep 15 17:08:28 Lidia OpenSM[6107]: OpenSM 3.2.6_20090317#012
Sep 15 17:08:28 Lidia OpenSM[6110]:
/var/log/opensm.0x0003ba000100c02e.log log file opened
Sep 15 17:08:28 Lidia OpenSM[6110]: OpenSM 3.2.6_20090317#012
Sep 15 17:08:28 Lidia OpenSM[6107]: Entering DISCOVERING state#012
Sep 15 17:08:28 Lidia OpenSM[6110]: Entering DISCOVERING state#012
Sep 15 17:08:28 Lidia kernel: [ 46.506594] ib_mad: Method 1 already in use
Sep 15 17:08:28 Lidia kernel: [ 46.622598] ib_mad: Method 1 already in use
Sep 15 17:08:28 Lidia OpenSM[6107]: Exiting SM#012
Sep 15 17:08:28 Lidia OpenSM[6110]: Exiting SM#012
Sep 15 17:08:33 Lidia kernel: [ 54.259857] warning: `ntpd' uses 32-bit
capabilities (legacy support in use)
Sep 15 17:08:34 Lidia OpenSM[5071]: Entering MASTER state#012
Sep 15 17:08:34 Lidia OpenSM[5074]: Entering MASTER state#012
Sep 15 17:08:35 Lidia kernel: [ 57.411566] eth0: no IPv6 routers present
Sep 15 17:08:44 Lidia kernel: [ 68.509666] ib_query_port failed (-16)
for mlx4_0
Sep 15 17:08:54 Lidia kernel: [ 80.077614] Couldn't query port
Sep 15 17:08:54 Lidia kernel: [ 80.077642] ib0: ib_query_gid() failed
Sep 15 17:08:55 Lidia ibstat: ibpanic: [6565] main: stat of IB device
'mlx4_0' failed: (Device or resource busy)
Sep 15 17:09:04 Lidia kernel: [ 93.364796] ib_query_port failed (-16)
for mlx4_0
Sep 15 17:09:04 Lidia kernel: [ 93.364935] ib0: ib_query_port failed
Sep 15 17:09:14 Lidia OpenSM[5071]: Errors during initialization#012
Sep 15 17:09:16 Lidia kernel: [ 107.413184] ib0: ib_query_gid() failed
Sep 15 17:09:44 Lidia kernel: [ 143.443054] ib0: ib_query_port failed
Sep 15 17:10:08 Lidia kernel: [ 175.558610] ib0: ib_query_gid() failed
Sep 15 17:10:18 Lidia OpenSM[5074]: Errors during initialization#012
Cheers,
Yann
More information about the Pkg-ofed-devel
mailing list