[Debian-ha-maintainers] Oops breaks resource failover in RHCS

Fabio M. Di Nitto fdinitto at redhat.com
Wed Feb 17 06:37:59 UTC 2010


Please file a bug on bugzilla.redhat.com -> Fedora rawhide -> component
cluster. I´ll take care to reassign it to the correct maintainer.

Yes I understand you are running Debian, but we use RH bugzilla instance
for upstream, so just go ahead and add all the info in there.

thanks
Fabio

On 2/17/2010 3:07 AM, Ernesto Rodriguez Reina wrote:
> Hi, once I wrote you because I had a very very similar problem, and I
> though it was completed solved. Unfortunately I saw the OOPS again. We
> have repeted some times and always get the same. Here is my scenario:
> 
> Node master with nodeid=1;
> Node spare with nodeid=2;
> Node slave1 with nodeid=3;
> Node slave2 with nodeid=4;
> 
> We shutdown node master. Services are corrected relocated. We turn on
> node Master and again services are corrected relocated. We then
> shutdown node master again and then the oops appears but only on node
> spare, nodes slave1 and slave2 seems to be ok with services running.
> We tested with to different kernels 2.6.32.8 and 2.6.31.5 (with patch
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=063c4c99630c0b06afad080d2a18bda64172c1a2).
> 
> We are using RHCS 3.0.4-2 from debian mirror. Any ideas of how to
> solve this? We are going to test with RHCS 3.0.6-5
> 
> Hoping you can help me. Best regards,
> Ernesto
> 
> The oops:
> 
> with kernel 2.6.32.8:
> Feb 16 19:48:22 spare kernel: [ 1080.523027] INFO: task rgmanager:6531
> blocked for more than 120 seconds.
> Feb 16 19:48:22 spare kernel: [ 1080.523091] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 16 19:48:22 spare kernel: [ 1080.523166] rgmanager     D
> 0000000000000000     0  6531   2363 0x00000000
> Feb 16 19:48:22 spare kernel: [ 1080.523170]  ffffffff826fc080
> 0000000000000086 0000000000000296 ffffffff8104d7d9
> Feb 16 19:48:22 spare kernel: [ 1080.523175]  ffff8801ab0a5038
> 000000000000e1c8 ffff8801ac083fd8 ffff8801ab915000
> Feb 16 19:48:22 spare kernel: [ 1080.523178]  ffff8801ab9154b0
> ffff8801ab0a5010 ffffffff82a420d8 ffff8801ab9154b0
> Feb 16 19:48:22 spare kernel: [ 1080.523181] Call Trace:
> Feb 16 19:48:22 spare kernel: [ 1080.523189]  [<ffffffff8104d7d9>] ?
> try_to_wake_up+0x109/0x2d0
> Feb 16 19:48:22 spare kernel: [ 1080.523194]  [<ffffffff81234bc4>] ?
> cpumask_any_but+0x24/0x40
> Feb 16 19:48:22 spare kernel: [ 1080.523199]  [<ffffffff8140d7a5>] ?
> __down_read+0x85/0xb5
> Feb 16 19:48:22 spare kernel: [ 1080.523208]  [<ffffffffa04b7960>] ?
> dlm_user_request+0x60/0x240 [dlm]
> Feb 16 19:48:22 spare kernel: [ 1080.523212]  [<ffffffff8110a72c>] ?
> __kmalloc+0x11c/0x250
> Feb 16 19:48:22 spare kernel: [ 1080.523217]  [<ffffffffa04c2196>] ?
> device_write+0x686/0x790 [dlm]
> Feb 16 19:48:22 spare kernel: [ 1080.523221]  [<ffffffff81111f7b>] ?
> vfs_write+0xcb/0x1a0
> Feb 16 19:48:22 spare kernel: [ 1080.523224]  [<ffffffff81112153>] ?
> sys_write+0x53/0xa0
> Feb 16 19:48:22 spare kernel: [ 1080.523227]  [<ffffffff8100bf82>] ?
> system_call_fastpath+0x16/0x1b
> 
> with kernel 2.6.31.5 (with patch
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=063c4c99630c0b06afad080d2a18bda64172c1a2):
> Feb 16 20:35:27 spare kernel: [ 1320.436213] INFO: task
> rgmanager:13795 blocked for more than 120 seconds.
> Feb 16 20:35:27 spare kernel: [ 1320.436277] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 16 20:35:27 spare kernel: [ 1320.436352] rgmanager     D
> 0000000000000000     0 13795   2247 0x00000000
> Feb 16 20:35:27 spare kernel: [ 1320.436357]  ffff8801ae219000
> 0000000000000086 ffff88019293fd88 ffff8801a1cfbe90
> Feb 16 20:35:27 spare kernel: [ 1320.436360]  0000000000013f80
> 000000000000e168 ffff8801928a1000 ffff8801928a14b8
> Feb 16 20:35:27 spare kernel: [ 1320.436364]  0000000200000002
> 00000001000bf260 ffff8801ab843038 ffff8801928a14b8
> Feb 16 20:35:27 spare kernel: [ 1320.436367] Call Trace:
> Feb 16 20:35:27 spare kernel: [ 1320.436376]  [<ffffffff813ea425>] ?
> __down_read+0x85/0xb5
> Feb 16 20:35:27 spare kernel: [ 1320.436389]  [<ffffffffa052c970>] ?
> dlm_user_request+0x60/0x240 [dlm]
> Feb 16 20:35:27 spare kernel: [ 1320.436393]  [<ffffffff81077aef>] ?
> wake_futex+0x3f/0x80
> Feb 16 20:35:27 spare kernel: [ 1320.436397]  [<ffffffff810d4c40>] ?
> shmem_delete_inode+0x0/0x110
> Feb 16 20:35:27 spare kernel: [ 1320.436401]  [<ffffffff8100caee>] ?
> invalidate_interrupt0+0xe/0x20
> Feb 16 20:35:27 spare kernel: [ 1320.436406]  [<ffffffff810fc1cc>] ?
> __kmalloc+0x11c/0x250
> Feb 16 20:35:27 spare kernel: [ 1320.436414]  [<ffffffffa05370f6>] ?
> device_write+0x686/0x790 [dlm]
> Feb 16 20:35:27 spare kernel: [ 1320.436418]  [<ffffffff8105c7a3>] ?
> do_sigaction+0x1b3/0x1d0
> Feb 16 20:35:27 spare kernel: [ 1320.436421]  [<ffffffff8105c691>] ?
> do_sigaction+0xa1/0x1d0
> Feb 16 20:35:27 spare kernel: [ 1320.436424]  [<ffffffff81102e0b>] ?
> vfs_write+0xcb/0x1a0
> Feb 16 20:35:27 spare kernel: [ 1320.436427]  [<ffffffff81102fe3>] ?
> sys_write+0x53/0xa0
> Feb 16 20:35:27 spare kernel: [ 1320.436430]  [<ffffffff8100bf02>] ?
> system_call_fastpath+0x16/0x1b
> 
> 
> 




More information about the Debian-ha-maintainers mailing list