[Pkg-ofed-devel] lenny - openmpi problems
jobic
jobic at polytech.univ-mrs.fr
Tue Sep 15 16:03:46 UTC 2009
Guy Coates a écrit :
> Yann JOBIC wrote:
>> Guy Coates wrote:
>>>
>>>> I installed the package, and now i cannot ibping , and ibstat isn't
>>>> working :
>>>>
>>>> Lilou:~# ibstat
>>>> ibpanic: [6594] main: stat of IB device 'mlx4_0' failed: (Device or
>>>> resource busy)
>>>
>>> You will need to reboot once the kernel module package has been
>>> installed. Assuming that you have done that, is there anything odd
>>> in /var/log/messages /
>>> dmesg?
>>>
>>> Cheers,
>>>
>>> Guy
>>>
>> Maybe i loaded the wrongs modules ?
>>
>> Lidia:~# lsmod | grep mlx
>> mlx4_ib 61632 0
>> ib_mad 39336 4 ib_umad,ib_cm,ib_sa,mlx4_ib
>> ib_core 70656 10
>> ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad
>>
>
> Those module sizes look correct; they match what I have on my machine
> (you can double check them with modinfo if you are still unsure).
>
> Are there any unusual messages in the kernel log when the infiniband
> modules are loaded? My machine show just these messages:
>
>
> [ 2.291810] mlx4_core: Mellanox ConnectX core driver v1.0 (April 4,
> 2008)
> [ 2.291810] mlx4_core: Initializing 0000:0c:00.0
> [ 3.861825] mlx4_core 0000:0c:00.0: Requested number of MACs is too
> much for port 1, reducing to 1.
> [ 5.035171] ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 18 (level,
> low) -> IRQ 18
> [ 5.035171] PCI: Setting latency timer of device 0000:00:1d.0 to 64
> [ 34.998522] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0
> (April 4, 2008)
>
I've got :
Sep 15 17:32:44 Lilou kernel: [ 5.062079] mlx4_core: Mellanox
ConnectX core driver v1.0 (April 4, 2008)
Sep 15 17:32:44 Lilou kernel: [ 5.062079] mlx4_core: Initializing
0000:03:00.0
Sep 15 17:32:44 Lilou kernel: [ 5.062079] ACPI: PCI Interrupt Link
[LNKD] enabled at IRQ 19
Sep 15 17:32:44 Lilou kernel: [ 5.062079] ACPI: PCI Interrupt
0000:03:00.0[A] -> Link [LNKD] -> GSI 19 (level, low) -> IRQ 19
Sep 15 17:32:44 Lilou kernel: [ 5.062079] PCI: Setting latency timer
of device 0000:03:00.0 to 64
Sep 15 17:32:44 Lilou kernel: [ 41.509307] mlx4_ib: Mellanox ConnectX
InfiniBand driver v1.0 (April 4, 2008)
>
> Did you install the new kernel modules on both of your test hosts?
Yes, the second machine comes from a dd from the installed one.
>
> As an outside chance, have you made sure that your infiniband card
> firmware is all up to date?
I've got this firmeware :
CA 'mlx4_0'
CA type: MT25418
Number of ports: 2
Firmware version: 2.5.100
Hardware version: a0
Node GUID: 0x0003ba000100bec0
System image GUID: 0x0003ba000100bec3
I update the firmeware 2 weeks ago.
The thing is, ibstat is not working, and thus ibping neither. But i can
see the ib0,ib1 interface, with the adresses confirgured.
With the modules shiped with the repository, those were working.
The build progress should be fine, however i don't know if i can see
some logs about it.
gcc is 4.3.2
Cheers
Yann
More information about the Pkg-ofed-devel
mailing list