Bug#563602: mdadm: autocheck running checkarray fails and degrades array when weekly cron jobs tick off

Mike Bilow mike at bilow.com
Mon Jan 4 00:26:21 UTC 2010


Package: mdadm
Version: 2.6.7.2-3
Severity: grave
Justification: causes non-serious data loss


Because of the size of the media being checked, the "checkarray" 
instance that was invoked at 0057 had not yet completed when the usual 
weekly cron jobs triggered at 0626. It is common for those cron jobs to 
send signals to various daemons in order to allow log rotation.

Since I am fairly confident that there is nothing actually wrong with 
the disks, although I have no hard proof that it was the cron jobs that 
caused "checkarray" to fail in such a way as to degrade array "md1" and 
mark "sda2" as faulty, the coincidence is too strong to ignore. I am not 
sufficiently familiar with the weekly cron jobs installed by default, 
but experience teaches that problems occurring exactly at 0626 are 
extremely likely to be associated with the signals sent to daemons.

Jan  3 00:57:25 virtual1 mdadm[3471]: RebuildStarted event detected on 
md device /dev/md1
Jan  3 02:22:27 virtual1 mdadm[3471]: Rebuild20 event detected on md 
device /dev/md1
Jan  3 03:50:29 virtual1 mdadm[3471]: Rebuild40 event detected on md 
device /dev/md1
Jan  3 05:21:32 virtual1 mdadm[3471]: Rebuild60 event detected on md 
device /dev/md1
Jan  3 06:26:14 virtual1 mdadm[3471]: Fail event detected on md device 
/dev/md1, component device /dev/sda2
Jan  3 06:26:14 virtual1 mdadm[3471]: RebuildFinished event detected on 
md device /dev/md1, component device  mismatches found: 128

I was able to resynchronize the array (which is so far 96% done) by 
removing and adding the component ("mdadm --remove /dev/md1 /dev/sda2" 
followed by "mdadm --add /dev/mda1 /dev/sda2") without even rebooting 
the system, but the array was left degraded for several hours until I 
received and was able to act on the e-mail message from the monitoring 
daemon, and the resynchronization is consuming about 10 hours, but the 
array would have stayed degraded until manual intervention occurred.

While writing this bug report, the resynchronization completed 
successfully and promoited the array out of degraded mode:

Jan  3 10:01:31 virtual1 mdadm[3471]: RebuildStarted event detected on 
md device /dev/md1
Jan  3 11:35:31 virtual1 mdadm[3471]: Rebuild20 event detected on md 
device /dev/md1
Jan  3 13:23:32 virtual1 mdadm[3471]: Rebuild40 event detected on md 
device /dev/md1
Jan  3 15:23:33 virtual1 mdadm[3471]: Rebuild60 event detected on md 
device /dev/md1
Jan  3 17:22:33 virtual1 mdadm[3471]: Rebuild80 event detected on md 
device /dev/md1
Jan  3 19:21:20 virtual1 mdadm[3471]: RebuildFinished event detected on 
md device /dev/md1, component device  mismatches found: 128
Jan  3 19:21:20 virtual1 mdadm[3471]: SpareActive event detected on md 
device /dev/md1, component device /dev/sda2


-- Package-specific info:
--- mount output
/dev/md0 on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/mapper/virtual1-colossus--archive on /mnt/colossus-archive type ext3 (rw)
/dev/mapper/virtual1-images--archive on /mnt/images-archive type ext3 (rw)

--- mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=520af0cf:5d9358cf:0bfe99a0:ead009c9
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=0fe83522:3afebffc:663419d7:7a1d4550

# This file was auto-generated on Thu, 30 Oct 2008 17:06:34 +0000
# by mkconf $Id: mkconf 261 2006-11-09 13:32:35Z madduck $

--- /proc/mdstat:
Personalities : [raid1] 
md1 : active raid1 sda2[2] sdb2[1]
      974808064 blocks [2/1] [_U]
      [===================>.]  recovery = 96.2% (938482880/974808064) finish=21.8min speed=27728K/sec
      
md0 : active raid1 sda1[0] sdb1[1]
      1951744 blocks [2/2] [UU]
      
unused devices: <none>

--- /proc/partitions:
major minor  #blocks  name

   8     0  976762584 sda
   8     1    1951866 sda1
   8     2  974808135 sda2
   8    16  976762584 sdb
   8    17    1951866 sdb1
   8    18  974808135 sdb2
   9     0    1951744 md0
   9     1  974808064 md1
 253     0   15728640 dm-0
 253     1   41943040 dm-1
 253     2    2097152 dm-2
 253     3    2097152 dm-3
 253     4    2097152 dm-4
 253     5    2097152 dm-5
 253     6  209715200 dm-6
 253     7   20971520 dm-7
 253     8     258048 dm-8

--- initrd.img-2.6.26-2-xen-686:
33375 blocks
ed87a4a20991312e12f397abd288fd54  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/dm-log.ko
61a6adc3a4dffac9ee5ad96f5196b590  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/raid0.ko
b66b54b318430347504889d45bf16ba2  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/multipath.ko
07c46476b799567a5b551f4c0ad71482  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/raid1.ko
599daaac429638114e9acd150124145e  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/linear.ko
75ac8c783adaafe1e68aee05fd5fdccd  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/raid10.ko
5d62f4f02384a20b6410ada2724dc14c  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/dm-snapshot.ko
e3e027bf1ef37e06b6d976dedf2faf22  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/dm-mirror.ko
c91ab1a51ed03662c06c924b64e19f9e  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/dm-mod.ko
49b779483baf3655fb819c1ebc0835f3  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/md-mod.ko
443504a91e24c3a87d36feccc5019bb8  ./lib/modules/2.6.26-2-xen-686/kernel/drivers/md/raid456.ko
e1e2d0e985196fecaf41fb42e9968af2  ./scripts/local-top/mdadm
845f04e5ccb4e42938e7779d06a304b3  ./etc/mdadm/mdadm.conf
ea9abd44166c288560f8c9789cb3949d  ./sbin/mdadm

--- /proc/modules:
dm_mirror 16320 0 - Live 0xee157000
dm_log 9412 1 dm_mirror, Live 0xee15c000
dm_snapshot 15108 0 - Live 0xee079000
dm_mod 47336 22 dm_mirror,dm_log,dm_snapshot, Live 0xee0c9000
raid1 19200 2 - Live 0xee084000
md_mod 69212 3 raid1, Live 0xee1a1000

--- /var/log/syslog:

--- volume detail:
/dev/hdc is not recognised by mdadm.
/dev/sda is not recognised by mdadm.
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 520af0cf:5d9358cf:0bfe99a0:ead009c9
  Creation Time : Thu Oct 30 12:58:53 2008
     Raid Level : raid1
  Used Dev Size : 1951744 (1906.32 MiB 1998.59 MB)
     Array Size : 1951744 (1906.32 MiB 1998.59 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sun Jan  3 18:57:29 2010
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : e4011b7c - correct
         Events : 31


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
--
/dev/sda2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0fe83522:3afebffc:663419d7:7a1d4550
  Creation Time : Thu Oct 30 13:00:18 2008
     Raid Level : raid1
  Used Dev Size : 974808064 (929.65 GiB 998.20 GB)
     Array Size : 974808064 (929.65 GiB 998.20 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Sun Jan  3 18:57:29 2010
          State : active
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
       Checksum : a2c9838d - correct
         Events : 25749


      Number   Major   Minor   RaidDevice State
this     2       8        2        2      spare   /dev/sda2

   0     0       0        0        0      removed
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8        2        2      spare   /dev/sda2
--
/dev/sdb is not recognised by mdadm.
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 520af0cf:5d9358cf:0bfe99a0:ead009c9
  Creation Time : Thu Oct 30 12:58:53 2008
     Raid Level : raid1
  Used Dev Size : 1951744 (1906.32 MiB 1998.59 MB)
     Array Size : 1951744 (1906.32 MiB 1998.59 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sun Jan  3 18:57:29 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : e4011bac - correct
         Events : 30


      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
--
/dev/sdb2:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0fe83522:3afebffc:663419d7:7a1d4550
  Creation Time : Thu Oct 30 13:00:18 2008
     Raid Level : raid1
  Used Dev Size : 974808064 (929.65 GiB 998.20 GB)
     Array Size : 974808064 (929.65 GiB 998.20 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Sun Jan  3 18:57:29 2010
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
       Checksum : a2c9e839 - correct
         Events : 25750


      Number   Major   Minor   RaidDevice State
this     1       8       18        1      active sync   /dev/sdb2

   0     0       0        0        0      removed
   1     1       8       18        1      active sync   /dev/sdb2
   2     2       8        2        2      spare   /dev/sda2
--

--- /proc/cmdline
root=/dev/md0 ro console=tty0

--- grub legacy:
module		/boot/vmlinuz-2.6.26-2-xen-686 root=/dev/md0 ro console=tty0
module		/boot/vmlinuz-2.6.18-6-xen-vserver-686 root=/dev/md0 ro console=tty0
module		/boot/vmlinuz-2.6.18-6-xen-686 root=/dev/md0 ro console=tty0
kernel		/boot/vmlinuz-2.6.26-2-xen-686 root=/dev/md0 ro 
kernel		/boot/vmlinuz-2.6.26-2-xen-686 root=/dev/md0 ro single
kernel		/boot/vmlinuz-2.6.26-2-686 root=/dev/md0 ro 
kernel		/boot/vmlinuz-2.6.26-2-686 root=/dev/md0 ro single
kernel		/boot/vmlinuz-2.6.18-6-xen-vserver-686 root=/dev/md0 ro 
kernel		/boot/vmlinuz-2.6.18-6-xen-vserver-686 root=/dev/md0 ro single
kernel		/boot/vmlinuz-2.6.18-6-xen-686 root=/dev/md0 ro 
kernel		/boot/vmlinuz-2.6.18-6-xen-686 root=/dev/md0 ro single
kernel		/boot/vmlinuz-2.6.18-6-686 root=/dev/md0 ro 
kernel		/boot/vmlinuz-2.6.18-6-686 root=/dev/md0 ro single

--- udev:
ii  udev           0.125-7+lenny3 /dev/ and hotplug management daemon
cd6f5683974ea65603f04ec699b3cff2  /etc/udev/rules.d/65_mdadm.vol_id.rules

--- /dev:
brw-rw---- 1 root disk 9, 0 2009-12-04 17:33 /dev/md0
brw-rw---- 1 root disk 9, 1 2009-12-04 17:33 /dev/md1

/dev/disk/by-id:
total 0
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 ata-WDC_WD10EACS-00D6B0_WD-WCAU40812326 -> ../../sda
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 ata-WDC_WD10EACS-00D6B0_WD-WCAU40812326-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 ata-WDC_WD10EACS-00D6B0_WD-WCAU40812326-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 ata-WDC_WD10EACS-00D6B0_WD-WCAU41049570 -> ../../sdb
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 ata-WDC_WD10EACS-00D6B0_WD-WCAU41049570-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 ata-WDC_WD10EACS-00D6B0_WD-WCAU41049570-part2 -> ../../sdb2
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 md-uuid-0fe83522:3afebffc:663419d7:7a1d4550 -> ../../md1
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 md-uuid-520af0cf:5d9358cf:0bfe99a0:ead009c9 -> ../../md0
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 scsi-SATA_WDC_WD10EACS-00_WD-WCAU40812326 -> ../../sda
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 scsi-SATA_WDC_WD10EACS-00_WD-WCAU40812326-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 scsi-SATA_WDC_WD10EACS-00_WD-WCAU40812326-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 scsi-SATA_WDC_WD10EACS-00_WD-WCAU41049570 -> ../../sdb
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 scsi-SATA_WDC_WD10EACS-00_WD-WCAU41049570-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 scsi-SATA_WDC_WD10EACS-00_WD-WCAU41049570-part2 -> ../../sdb2

/dev/disk/by-label:
total 0
lrwxrwxrwx 1 root root 9 2009-12-04 17:33 Xen\x20dom0 -> ../../md0

/dev/disk/by-path:
total 0
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 pci-0000:00:1f.1-ide-1:0 -> ../../hdc
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 pci-0000:00:1f.2-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 pci-0000:00:1f.2-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 pci-0000:00:1f.2-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx 1 root root  9 2009-12-04 17:33 pci-0000:00:1f.2-scsi-1:0:0:0 -> ../../sdb
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 pci-0000:00:1f.2-scsi-1:0:0:0-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 2009-12-04 17:33 pci-0000:00:1f.2-scsi-1:0:0:0-part2 -> ../../sdb2

/dev/disk/by-uuid:
total 0
lrwxrwxrwx 1 root root 9 2009-12-04 17:33 8d4b7934-f2ca-4219-8525-33876d05b12a -> ../../md0

/dev/md:
total 0


-- System Information:
Debian Release: 5.0.3
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-2-xen-686 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages mdadm depends on:
ii  debconf                   1.5.24         Debian configuration management sy
ii  libc6                     2.7-18         GNU C Library: Shared libraries
ii  lsb-base                  3.2-20         Linux Standard Base 3.2 init scrip
ii  makedev                   2.3.1-88       creates device files in /dev
ii  udev                      0.125-7+lenny3 /dev/ and hotplug management daemo

Versions of packages mdadm recommends:
ii  exim4                         4.69-9     metapackage to ease Exim MTA (v4) 
ii  exim4-daemon-light [mail-tran 4.69-9     lightweight Exim MTA (v4) daemon
ii  module-init-tools             3.4-1      tools for managing Linux kernel mo

mdadm suggests no packages.

-- debconf information:
  mdadm/autostart: true
  mdadm/mail_to: root
  mdadm/initrdstart_msg_errmd:
* mdadm/initrdstart: all
  mdadm/initrdstart_msg_errconf:
  mdadm/initrdstart_notinconf: false
  mdadm/initrdstart_msg_errexist:
  mdadm/initrdstart_msg_intro:
  mdadm/autocheck: true
  mdadm/initrdstart_msg_errblock:
  mdadm/start_daemon: true





More information about the pkg-mdadm-devel mailing list