[Debian-ha-maintainers] Oops breaks resource failover in RHCS

Thu Feb 18 19:01:38 UTC 2010

I was trying to make my cluster works with RHCS 3.0.6-5 and all I get is

Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] CLM CONFIGURATION CHANGE
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] New Configuration:
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] #011r(0) ip(10.10.10.1)
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] #011r(0) ip(10.10.10.2)
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] Members Left:
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] Members Joined:
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] CLM CONFIGURATION CHANGE
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] New Configuration:
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] #011r(0) ip(10.10.10.1)
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] #011r(0) ip(10.10.10.2)
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] Members Left:
Feb 18 13:27:15 spare corosync[2109]:   [CLM   ] Members Joined:
Feb 18 13:27:15 spare corosync[2109]:   [TOTEM ] A processor joined or
left the membership and a new membership was formed.
Feb 18 13:27:23 spare fenced[2162]: daemon cpg_join error retrying
Feb 18 13:27:23 spare gfs_controld[2196]: daemon cpg_join error retrying
Feb 18 13:27:23 spare dlm_controld[2181]: daemon cpg_join error retrying
Feb 18 13:27:25 spare corosync[2109]:   [TOTEM ] A processor failed,
forming new configuration.

This appears again and again and again... This happend always int he
second node in start (spare in this case with ip 10.10.10.2). That was
using linux kernel 2.6.1.32.8 and and 2.6.31.5 (with patch [1]).

Can any body help me? Can any body reproduce the problem?

Regards,
Ernesto

[1] http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=063c4c99630c0b06afad080d2a18bda64172c1a2

-- 
Ernesto Rodriguez Reina