[Pkg-ofed-devel] ofa-kernel: ib_query_gid() failed

Simon Kainz simon.kainz at tugraz.at
Tue Oct 13 09:01:34 UTC 2009


Guy Coates wrote:
> Mario Lang wrote:
>> Guy Coates <gmpc at sanger.ac.uk> writes:
>>
>>> Also, if you add the GUIDs into /etc/default/opensm, you should avoid
>>> the ibstat call in the opensm init script altogether.
>> Yes, I know, and that is what I actually did.
>>
>> However, ibstat failing is, as you wrote in another mail in this thread,
>> pointing to something very fundamental being broken.
>>
> Could you post an strace of the failing ibstat?
> 
> Cheers,
> 
> Guy
> 
Hello. We tried some of your suggestions, especially removing all
modules and only using mlx4_ib and ib_umad. Please see below:

Loading mlx4_core with debug outputs:

[  380.184470] mlx4_core 0000:82:00.0: FW version 2.6.900 (cmd intf rev
3), max commands 16
[  380.184478] mlx4_core 0000:82:00.0: Catastrophic error buffer at
0x1f020, size 0x10, BAR 0
[  380.184481] mlx4_core 0000:82:00.0: FW size 385 KB
[  380.184485] mlx4_core 0000:82:00.0: Clear int @ f0058, BAR 0
[  380.186575] mlx4_core 0000:82:00.0: Mapped 26 chunks/6168 KB for FW.
[  381.242827] mlx4_core 0000:82:00.0: BlueFlame available (reg size
512, regs/page 256)
[  381.242987] mlx4_core 0000:82:00.0: Base MM extensions: flags
00000cc0, rsvd L_Key 00000500
[  381.242994] mlx4_core 0000:82:00.0: Max ICM size 4294967296 MB
[  381.242997] mlx4_core 0000:82:00.0: Max QPs: 16777216, reserved QPs:
64, entry size: 256
[  381.243004] mlx4_core 0000:82:00.0: Max SRQs: 16777216, reserved
SRQs: 64, entry size: 128
[  381.243009] mlx4_core 0000:82:00.0: Max CQs: 16777216, reserved CQs:
128, entry size: 128
[  381.243015] mlx4_core 0000:82:00.0: Max EQs: 512, reserved EQs: 4,
entry size: 128
[  381.243021] mlx4_core 0000:82:00.0: reserved MPTs: 16, reserved MTTs: 16
[  381.243027] mlx4_core 0000:82:00.0: Max PDs: 8388608, reserved PDs:
4, reserved UARs: 1
[  381.243033] mlx4_core 0000:82:00.0: Max QP/MCG: 8388608, reserved MGMs: 0
[  381.243039] mlx4_core 0000:82:00.0: Max CQEs: 4194304, max WQEs:
16384, max SRQ WQEs: 16384
[  381.243045] mlx4_core 0000:82:00.0: Local CA ACK delay: 15, max MTU:
4096, port width cap: 3
[  381.243051] mlx4_core 0000:82:00.0: Max SQ desc size: 1008, max SQ
S/G: 62
[  381.243057] mlx4_core 0000:82:00.0: Max RQ desc size: 512, max RQ S/G: 32
[  381.243063] mlx4_core 0000:82:00.0: Max GSO size: 131072
[  381.243068] mlx4_core 0000:82:00.0: DEV_CAP flags:
[  381.243074] mlx4_core 0000:82:00.0:     RC transport
[  381.243079] mlx4_core 0000:82:00.0:     UC transport
[  381.243084] mlx4_core 0000:82:00.0:     UD transport
[  381.243086] mlx4_core 0000:82:00.0:     XRC transport
[  381.243092] mlx4_core 0000:82:00.0:     FCoIB support
[  381.243097] mlx4_core 0000:82:00.0:     SRQ support
[  381.243103] mlx4_core 0000:82:00.0:     IPoIB checksum offload
[  381.243108] mlx4_core 0000:82:00.0:     P_Key violation counter
[  381.243113] mlx4_core 0000:82:00.0:     Q_Key violation counter
[  381.243119] mlx4_core 0000:82:00.0:     APM support
[  381.243124] mlx4_core 0000:82:00.0:     Atomic ops support
[  381.243129] mlx4_core 0000:82:00.0:     Address vector port checking
support
[  381.243135] mlx4_core 0000:82:00.0:     UD multicast support
[  381.243140] mlx4_core 0000:82:00.0:     Router support
[  381.243155] mlx4_core 0000:82:00.0:   profile[ 0] (  CMPT): 2^26
entries @ 0x         0, size 0x 100000000
[  381.243162] mlx4_core 0000:82:00.0:   profile[ 1] (RDMARC): 2^22
entries @ 0x 100000000, size 0x   8000000
[  381.243168] mlx4_core 0000:82:00.0:   profile[ 2] (    QP): 2^18
entries @ 0x 108000000, size 0x   4000000
[  381.243175] mlx4_core 0000:82:00.0:   profile[ 3] (   MTT): 2^20
entries @ 0x 10c000000, size 0x   4000000
[  381.243181] mlx4_core 0000:82:00.0:   profile[ 4] (  DMPT): 2^19
entries @ 0x 110000000, size 0x   2000000
[  381.243188] mlx4_core 0000:82:00.0:   profile[ 5] (  ALTC): 2^18
entries @ 0x 112000000, size 0x   1000000
[  381.243194] mlx4_core 0000:82:00.0:   profile[ 6] (   SRQ): 2^16
entries @ 0x 113000000, size 0x    800000
[  381.243200] mlx4_core 0000:82:00.0:   profile[ 7] (    CQ): 2^16
entries @ 0x 113800000, size 0x    800000
[  381.243207] mlx4_core 0000:82:00.0:   profile[ 8] (   MCG): 2^13
entries @ 0x 114000000, size 0x    200000
[  381.243213] mlx4_core 0000:82:00.0:   profile[ 9] (  AUXC): 2^18
entries @ 0x 114200000, size 0x     40000
[  381.243219] mlx4_core 0000:82:00.0:   profile[10] (    EQ): 2^06
entries @ 0x 114240000, size 0x      2000
[  381.243225] mlx4_core 0000:82:00.0: HCA context memory: reserving
4524296 KB
[  381.243252] mlx4_core 0000:82:00.0: 4524296 KB of HCA context
requires 8876 KB aux memory.
[  381.278964] mlx4_core 0000:82:00.0: Mapped 38 chunks/8876 KB for ICM aux.
[  381.279959] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at 0 for ICM.
[  381.281767] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
40000000 for ICM.
[  381.283579] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
80000000 for ICM.
[  381.283622] mlx4_core 0000:82:00.0: Mapped 1 chunks/4 KB at c0000000
for ICM.
[  381.283665] mlx4_core 0000:82:00.0: Mapped page at 20264a8000 to
114240000 for ICM.
[  381.283704] mlx4_core 0000:82:00.0: Mapped page at 20264a8000 to
114241000 for ICM.
[  381.285482] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
10c000000 for ICM.
[  381.287393] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
110000000 for ICM.
[  381.289186] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
108000000 for ICM.
[  381.290969] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114200000 for ICM.
[  381.292754] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
112000000 for ICM.
[  381.294539] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
100000000 for ICM.
[  381.296319] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
113800000 for ICM.
[  381.298113] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
113000000 for ICM.
[  381.299896] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114000000 for ICM.
[  381.303059] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114040000 for ICM.
[  381.303556] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114080000 for ICM.
[  381.305345] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
1140c0000 for ICM.
[  381.307125] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114100000 for ICM.
[  381.308902] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114140000 for ICM.
[  381.310681] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114180000 for ICM.
[  381.312468] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
1141c0000 for ICM.
[  382.191800] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
10c040000 for ICM.
[  382.202425] mlx4_core 0000:82:00.0: NOP command IRQ test passed



after loading mlx4_ib, dmesg tells me:

[  578.247698] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April
4, 2008)


node1:~# ibstat
ibpanic: [5180] main: stat of IB device 'mlx4_0' failed: (Device or
resource busy)

please see attached the strace log of ibstat.

open("/sys/class/infiniband/mlx4_0/sys_image_guid", O_RDONLY) = 3 hangs
for some seconds.


Regards,



-- 
DI Simon Kainz
Graz, University of Technology
Department Computing
Phone: ++43 (0) 316 / 873 6885
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ibstat.strace
URL: <http://lists.alioth.debian.org/pipermail/pkg-ofed-devel/attachments/20091013/676301da/attachment.asc>


More information about the Pkg-ofed-devel mailing list