[Debian-ha-maintainers] Bug#596694: corosync fails to start corretly

Frank Schmidt frank_schmidt at gmx.de
Mon Sep 13 13:46:09 UTC 2010


Package: corosync
Version: 1.2.1-1
Severity: important


Hi,

after a clean install of debian squeeze and increasing the consensus timeout to 3600 (to solve 
#573030) corosync does not start correctly after boot. crm_mon is unable to connect to the cluster.

The processlist (ps auxf) shows the following:

root       773  0.3  1.3 128960  5136 ?        Ssl  13:11   0:00 /usr/sbin/corosync
root       808  0.0  0.8 114484  3308 ?        S    13:11   0:00  \_ /usr/sbin/corosync
root       809  0.0  0.8 114480  3308 ?        S    13:11   0:00  \_ /usr/sbin/corosync
root       810  0.0  0.8 114480  3308 ?        S    13:11   0:00  \_ /usr/sbin/corosync
root       811  0.0  0.8 114480  3308 ?        S    13:11   0:00  \_ /usr/sbin/corosync
root       812  0.0  0.8 114480  3308 ?        S    13:11   0:00  \_ /usr/sbin/corosync
root       813  0.0  0.8 114480  3308 ?        S    13:11   0:00  \_ /usr/sbin/corosync
root       925  0.1  0.4  52416  1888 ?        Sl   13:11   0:00 /usr/sbin/rsyslogd -c4
root       958  0.0  0.3  44568  1336 ?        Ss   13:11   0:00 ha_logd: read process
root       959  0.0  0.2  44568   932 ?        S    13:11   0:00  \_ ha_logd: write process

Killing the corosync processes and doing a

> /etc/init.d/corosync start

the processlist now shows

root      1422  0.0  1.3 146160  5268 ?        Ssl  15:15   0:00 /usr/sbin/corosync
root      1433  0.0  3.1  79524 12036 ?        SLs  15:15   0:00  \_ /usr/lib/heartbeat/stonithd
103       1434  0.0  1.2  82444  4892 ?        S    15:15   0:00  \_ /usr/lib/heartbeat/cib
root      1435  0.0  0.6  83428  2372 ?        S    15:15   0:00  \_ /usr/lib/heartbeat/lrmd
103       1436  0.0  0.8  83504  3112 ?        S    15:15   0:00  \_ /usr/lib/heartbeat/attrd
103       1437  0.0  0.7  83876  2972 ?        S    15:15   0:00  \_ /usr/lib/heartbeat/pengine
103       1438  0.0  0.9  89732  3612 ?        S    15:15   0:00  \_ /usr/lib/heartbeat/crmd

and crm_mon works as expected.

The bug seems to be caused by corosync being started earlier than the syslog daemon during boot:

/etc/init.d/corosync:

#! /bin/sh
#
### BEGIN INIT INFO
# Provides:          corosync
# Required-Start:    $network $remote_fs
# Required-Stop:     $network $remote_fs
# Default-Start:     S
# Default-Stop:      0 1 6
# Short-Description: corosync cluster framework
### END INIT INFO
[...]


/etc/init.d/rsyslog:

#! /bin/sh
### BEGIN INIT INFO
# Provides:          rsyslog
# Required-Start:    $remote_fs $time
# Required-Stop:     umountnfs $time
# X-Stop-After:      sendsigs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: enhanced syslogd
# Description:       Rsyslog is an enhanced multi-threaded syslogd.
#                    It is quite compatible to stock sysklogd and can be
#                    used as a drop-in replacement.
### END INIT INFO
[...]


It seems that corosync must be started after the syslogd and perhaps even after ha_logd 
(part of the package cluster-glue):

/etc/init.d/logd:

[...]
### BEGIN INIT INFO
# Description: ha_logd is a non-blocking logging daemon.
#       It can log messages either to a file or through syslog
#       daemon.
# Short-Description: ha_logd logging daemon
# Provides: ha_logd
# Required-Start: $network $syslog $remote_fs
# Required-Stop: $network $syslog $remote_fs
# X-Start-Before: heartbeat openais
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
### END INIT INFO

Here corosync seems to be missing in the line 'X-Start-Before'.

Greetings,

Frank


-- System Information:
Debian Release: squeeze/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages corosync depends on:
ii  adduser                       3.112      add and remove users and groups
ii  libc6                         2.11.2-2   Embedded GNU C Library: Shared lib
ii  libcorosync4                  1.2.1-1    Standards-based cluster framework 
ii  lsb-base                      3.2-23.1   Linux Standard Base 3.2 init scrip

corosync recommends no packages.

corosync suggests no packages.

-- Configuration Files:
/etc/corosync/corosync.conf changed:
totem {
	version: 2
	# How long before declaring a token lost (ms)
	token: 3000
	# How many token retransmits before forming a new configuration
	token_retransmits_before_loss_const: 10
	# How long to wait for join messages in the membership protocol (ms)
	join: 60
	# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
	consensus: 3600
	# Turn off the virtual synchrony filter
	vsftype: none
	# Number of messages that may be sent by one processor on receipt of the token
	max_messages: 20
	# Limit generated nodeids to 31-bits (positive signed integers)
	clear_node_high_bit: yes
	# Disable encryption
 	secauth: off
	# How many threads to use for encryption/decryption
 	threads: 0
	# Optionally assign a fixed node id (integer)
	# nodeid: 1234
	# This specifies the mode of redundant ring, which may be none, active, or passive.
 	rrp_mode: none
 	interface {
		# The following values need to be set based on your environment 
		ringnumber: 0
		bindnetaddr: 127.0.0.1 
		mcastaddr: 226.94.1.1
		mcastport: 5405
	}
}
amf {
	mode: disabled
}
service {
 	# Load the Pacemaker Cluster Resource Manager
 	ver:       0
 	name:      pacemaker
}
aisexec {
        user:   root
        group:  root
}
logging {
        fileline: off
        to_stderr: yes
        to_logfile: no
        to_syslog: yes
	syslog_facility: daemon
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
                tags: enter|leave|trace1|trace2|trace3|trace4|trace6
        }
}

/etc/default/corosync changed:
START=yes


-- no debconf information





More information about the Debian-ha-maintainers mailing list