[Pkg-ofed-devel] lenny - openmpi problems

jobic jobic at polytech.univ-mrs.fr
Tue Sep 15 16:03:46 UTC 2009


Guy Coates a écrit :
> Yann JOBIC wrote:
>> Guy Coates wrote:
>>>
>>>> I installed the package, and now i cannot ibping , and ibstat isn't 
>>>> working :
>>>>
>>>> Lilou:~# ibstat
>>>> ibpanic: [6594] main: stat of IB device 'mlx4_0' failed: (Device or 
>>>> resource busy)
>>>
>>> You will need to reboot once the kernel module package has been 
>>> installed. Assuming that you have done that, is there anything odd 
>>> in /var/log/messages /
>>> dmesg?
>>>
>>> Cheers,
>>>
>>> Guy
>>>
>> Maybe i loaded the wrongs modules ?
>>
>> Lidia:~# lsmod | grep mlx
>> mlx4_ib                61632  0
>> ib_mad                 39336  4 ib_umad,ib_cm,ib_sa,mlx4_ib
>> ib_core                70656  10 
>> ib_ipoib,ib_umad,rdma_ucm,rdma_cm,ib_cm,iw_cm,ib_sa,ib_uverbs,mlx4_ib,ib_mad 
>>
>
> Those module sizes look correct; they match what I have on my machine 
> (you can double check them with modinfo if you are still unsure).
>
> Are there any unusual messages in the kernel log when the infiniband 
> modules are loaded? My machine show just these messages:
>
>
> [   2.291810] mlx4_core: Mellanox ConnectX core driver v1.0 (April 4, 
> 2008)
> [    2.291810] mlx4_core: Initializing 0000:0c:00.0
> [    3.861825] mlx4_core 0000:0c:00.0: Requested number of MACs is too 
> much for port 1, reducing to 1.
> [    5.035171] ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 18 (level, 
> low) -> IRQ 18
> [    5.035171] PCI: Setting latency timer of device 0000:00:1d.0 to 64
> [   34.998522] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 
> (April 4, 2008)
>
I've got :
Sep 15 17:32:44 Lilou kernel: [    5.062079] mlx4_core: Mellanox 
ConnectX core driver v1.0 (April 4, 2008)
Sep 15 17:32:44 Lilou kernel: [    5.062079] mlx4_core: Initializing 
0000:03:00.0
Sep 15 17:32:44 Lilou kernel: [    5.062079] ACPI: PCI Interrupt Link 
[LNKD] enabled at IRQ 19
Sep 15 17:32:44 Lilou kernel: [    5.062079] ACPI: PCI Interrupt 
0000:03:00.0[A] -> Link [LNKD] -> GSI 19 (level, low) -> IRQ 19
Sep 15 17:32:44 Lilou kernel: [    5.062079] PCI: Setting latency timer 
of device 0000:03:00.0 to 64
Sep 15 17:32:44 Lilou kernel: [   41.509307] mlx4_ib: Mellanox ConnectX 
InfiniBand driver v1.0 (April 4, 2008)

>
> Did you install the new kernel modules on both of your test hosts?
Yes, the second machine comes from a dd from the installed one.
>
> As an outside chance, have you made sure that your infiniband card 
> firmware is all up to date?
I've got this firmeware :

CA 'mlx4_0'
        CA type: MT25418
        Number of ports: 2
        Firmware version: 2.5.100
        Hardware version: a0
        Node GUID: 0x0003ba000100bec0
        System image GUID: 0x0003ba000100bec3


I update the firmeware 2 weeks ago.

The thing is, ibstat is not working, and thus ibping neither. But i can 
see the ib0,ib1 interface, with the adresses confirgured.
With the modules shiped with the repository, those were working.
The build progress should be fine, however i don't know if i can see 
some logs about it.
gcc is 4.3.2

Cheers

Yann



More information about the Pkg-ofed-devel mailing list