[Pkg-ofed-devel] ofa-kernel: ib_query_gid() failed
Simon Kainz
simon.kainz at tugraz.at
Tue Oct 13 09:01:34 UTC 2009
Guy Coates wrote:
> Mario Lang wrote:
>> Guy Coates <gmpc at sanger.ac.uk> writes:
>>
>>> Also, if you add the GUIDs into /etc/default/opensm, you should avoid
>>> the ibstat call in the opensm init script altogether.
>> Yes, I know, and that is what I actually did.
>>
>> However, ibstat failing is, as you wrote in another mail in this thread,
>> pointing to something very fundamental being broken.
>>
> Could you post an strace of the failing ibstat?
>
> Cheers,
>
> Guy
>
Hello. We tried some of your suggestions, especially removing all
modules and only using mlx4_ib and ib_umad. Please see below:
Loading mlx4_core with debug outputs:
[ 380.184470] mlx4_core 0000:82:00.0: FW version 2.6.900 (cmd intf rev
3), max commands 16
[ 380.184478] mlx4_core 0000:82:00.0: Catastrophic error buffer at
0x1f020, size 0x10, BAR 0
[ 380.184481] mlx4_core 0000:82:00.0: FW size 385 KB
[ 380.184485] mlx4_core 0000:82:00.0: Clear int @ f0058, BAR 0
[ 380.186575] mlx4_core 0000:82:00.0: Mapped 26 chunks/6168 KB for FW.
[ 381.242827] mlx4_core 0000:82:00.0: BlueFlame available (reg size
512, regs/page 256)
[ 381.242987] mlx4_core 0000:82:00.0: Base MM extensions: flags
00000cc0, rsvd L_Key 00000500
[ 381.242994] mlx4_core 0000:82:00.0: Max ICM size 4294967296 MB
[ 381.242997] mlx4_core 0000:82:00.0: Max QPs: 16777216, reserved QPs:
64, entry size: 256
[ 381.243004] mlx4_core 0000:82:00.0: Max SRQs: 16777216, reserved
SRQs: 64, entry size: 128
[ 381.243009] mlx4_core 0000:82:00.0: Max CQs: 16777216, reserved CQs:
128, entry size: 128
[ 381.243015] mlx4_core 0000:82:00.0: Max EQs: 512, reserved EQs: 4,
entry size: 128
[ 381.243021] mlx4_core 0000:82:00.0: reserved MPTs: 16, reserved MTTs: 16
[ 381.243027] mlx4_core 0000:82:00.0: Max PDs: 8388608, reserved PDs:
4, reserved UARs: 1
[ 381.243033] mlx4_core 0000:82:00.0: Max QP/MCG: 8388608, reserved MGMs: 0
[ 381.243039] mlx4_core 0000:82:00.0: Max CQEs: 4194304, max WQEs:
16384, max SRQ WQEs: 16384
[ 381.243045] mlx4_core 0000:82:00.0: Local CA ACK delay: 15, max MTU:
4096, port width cap: 3
[ 381.243051] mlx4_core 0000:82:00.0: Max SQ desc size: 1008, max SQ
S/G: 62
[ 381.243057] mlx4_core 0000:82:00.0: Max RQ desc size: 512, max RQ S/G: 32
[ 381.243063] mlx4_core 0000:82:00.0: Max GSO size: 131072
[ 381.243068] mlx4_core 0000:82:00.0: DEV_CAP flags:
[ 381.243074] mlx4_core 0000:82:00.0: RC transport
[ 381.243079] mlx4_core 0000:82:00.0: UC transport
[ 381.243084] mlx4_core 0000:82:00.0: UD transport
[ 381.243086] mlx4_core 0000:82:00.0: XRC transport
[ 381.243092] mlx4_core 0000:82:00.0: FCoIB support
[ 381.243097] mlx4_core 0000:82:00.0: SRQ support
[ 381.243103] mlx4_core 0000:82:00.0: IPoIB checksum offload
[ 381.243108] mlx4_core 0000:82:00.0: P_Key violation counter
[ 381.243113] mlx4_core 0000:82:00.0: Q_Key violation counter
[ 381.243119] mlx4_core 0000:82:00.0: APM support
[ 381.243124] mlx4_core 0000:82:00.0: Atomic ops support
[ 381.243129] mlx4_core 0000:82:00.0: Address vector port checking
support
[ 381.243135] mlx4_core 0000:82:00.0: UD multicast support
[ 381.243140] mlx4_core 0000:82:00.0: Router support
[ 381.243155] mlx4_core 0000:82:00.0: profile[ 0] ( CMPT): 2^26
entries @ 0x 0, size 0x 100000000
[ 381.243162] mlx4_core 0000:82:00.0: profile[ 1] (RDMARC): 2^22
entries @ 0x 100000000, size 0x 8000000
[ 381.243168] mlx4_core 0000:82:00.0: profile[ 2] ( QP): 2^18
entries @ 0x 108000000, size 0x 4000000
[ 381.243175] mlx4_core 0000:82:00.0: profile[ 3] ( MTT): 2^20
entries @ 0x 10c000000, size 0x 4000000
[ 381.243181] mlx4_core 0000:82:00.0: profile[ 4] ( DMPT): 2^19
entries @ 0x 110000000, size 0x 2000000
[ 381.243188] mlx4_core 0000:82:00.0: profile[ 5] ( ALTC): 2^18
entries @ 0x 112000000, size 0x 1000000
[ 381.243194] mlx4_core 0000:82:00.0: profile[ 6] ( SRQ): 2^16
entries @ 0x 113000000, size 0x 800000
[ 381.243200] mlx4_core 0000:82:00.0: profile[ 7] ( CQ): 2^16
entries @ 0x 113800000, size 0x 800000
[ 381.243207] mlx4_core 0000:82:00.0: profile[ 8] ( MCG): 2^13
entries @ 0x 114000000, size 0x 200000
[ 381.243213] mlx4_core 0000:82:00.0: profile[ 9] ( AUXC): 2^18
entries @ 0x 114200000, size 0x 40000
[ 381.243219] mlx4_core 0000:82:00.0: profile[10] ( EQ): 2^06
entries @ 0x 114240000, size 0x 2000
[ 381.243225] mlx4_core 0000:82:00.0: HCA context memory: reserving
4524296 KB
[ 381.243252] mlx4_core 0000:82:00.0: 4524296 KB of HCA context
requires 8876 KB aux memory.
[ 381.278964] mlx4_core 0000:82:00.0: Mapped 38 chunks/8876 KB for ICM aux.
[ 381.279959] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at 0 for ICM.
[ 381.281767] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
40000000 for ICM.
[ 381.283579] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
80000000 for ICM.
[ 381.283622] mlx4_core 0000:82:00.0: Mapped 1 chunks/4 KB at c0000000
for ICM.
[ 381.283665] mlx4_core 0000:82:00.0: Mapped page at 20264a8000 to
114240000 for ICM.
[ 381.283704] mlx4_core 0000:82:00.0: Mapped page at 20264a8000 to
114241000 for ICM.
[ 381.285482] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
10c000000 for ICM.
[ 381.287393] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
110000000 for ICM.
[ 381.289186] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
108000000 for ICM.
[ 381.290969] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114200000 for ICM.
[ 381.292754] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
112000000 for ICM.
[ 381.294539] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
100000000 for ICM.
[ 381.296319] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
113800000 for ICM.
[ 381.298113] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
113000000 for ICM.
[ 381.299896] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114000000 for ICM.
[ 381.303059] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114040000 for ICM.
[ 381.303556] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114080000 for ICM.
[ 381.305345] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
1140c0000 for ICM.
[ 381.307125] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114100000 for ICM.
[ 381.308902] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114140000 for ICM.
[ 381.310681] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
114180000 for ICM.
[ 381.312468] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
1141c0000 for ICM.
[ 382.191800] mlx4_core 0000:82:00.0: Mapped 1 chunks/256 KB at
10c040000 for ICM.
[ 382.202425] mlx4_core 0000:82:00.0: NOP command IRQ test passed
after loading mlx4_ib, dmesg tells me:
[ 578.247698] mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April
4, 2008)
node1:~# ibstat
ibpanic: [5180] main: stat of IB device 'mlx4_0' failed: (Device or
resource busy)
please see attached the strace log of ibstat.
open("/sys/class/infiniband/mlx4_0/sys_image_guid", O_RDONLY) = 3 hangs
for some seconds.
Regards,
--
DI Simon Kainz
Graz, University of Technology
Department Computing
Phone: ++43 (0) 316 / 873 6885
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ibstat.strace
URL: <http://lists.alioth.debian.org/pipermail/pkg-ofed-devel/attachments/20091013/676301da/attachment.asc>
More information about the Pkg-ofed-devel
mailing list