Bug#578352: mdadm: failed devices become spares!

Pierre Vignéras pierre at vigneras.name
Sat Apr 17 16:27:23 UTC 2010


Package: mdadm
Version: 2.6.7.2-3
Severity: critical
Justification: causes serious data loss

To make it short, it seems, if I grasp it correctly, that
mdadm removed failed USB drives from a RAID array, and then 
re-inserted them as spare devices. Since then, I cannot
tell mdadm that the 2 spares are actually good drives (I believe). 
It consider them as spares. I am quite afraid to run the RAID array again 
because I have some data that I do not want to lose. To make it more complex,
that RAID array, is part of an LVM group on which an XFS filesystem has been 
installed.
Since XFS was complaining (see log), I am wondering how I will: 
      - repair the RAID array;
      - repair the corresponding VG;
      - repair the XFS.

To make it long, here is what seems to have happened.
According to log file, on 2010-04-12 around 20:10 an error occured on sdf1:

Apr 12 19:22:44 phobos kernel: [5768580.538554] ip_tables: (C) 2000-2006 
Netfilter Core Team
Apr 12 20:10:02 phobos kernel: [5771419.310123] sd 5:0:0:0: [sdf] Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_RETRY
Apr 12 20:10:02 phobos kernel: [5771419.310144] end_request: I/O error, dev 
sdf, sector 115347706
Apr 12 20:10:02 phobos kernel: [5771419.310156] raid10: Disk failure on sdf1, 
disabling device.
Apr 12 20:10:02 phobos kernel: [5771419.310158] raid10: Operation continuing 
on 3 devices.
Apr 12 20:10:02 phobos kernel: [5771419.323466] RAID10 conf printout:
Apr 12 20:10:02 phobos kernel: [5771419.323480]  --- wd:3 rd:4
Apr 12 20:10:02 phobos kernel: [5771419.323488]  disk 0, wo:0, o:1, dev:sdd1
Apr 12 20:10:02 phobos kernel: [5771419.323495]  disk 1, wo:1, o:0, dev:sdf1
Apr 12 20:10:02 phobos kernel: [5771419.323501]  disk 2, wo:0, o:1, dev:sdc1
Apr 12 20:10:02 phobos kernel: [5771419.323508]  disk 3, wo:0, o:1, dev:sde1
Apr 12 20:10:02 phobos kernel: [5771419.323801] RAID10 conf printout:
Apr 12 20:10:02 phobos kernel: [5771419.323813]  --- wd:3 rd:4
Apr 12 20:10:02 phobos kernel: [5771419.323820]  disk 0, wo:0, o:1, dev:sdd1
Apr 12 20:10:02 phobos kernel: [5771419.323826]  disk 2, wo:0, o:1, dev:sdc1
Apr 12 20:10:02 phobos kernel: [5771419.323833]  disk 3, wo:0, o:1, dev:sde1
Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2, 
component device /dev/sdf1
Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md device 
/dev/md2, component device /dev/sdf1 

Is that last line normal? It seems to me that this failed drive has
been made a spare!  (I really hope that I misunderstood something). Is
it possible that the USB system (with its "plug'n play" sort-of
feature) had made the behaviour of mdadm so strange?

Note, that at that time, I was not logged in:

svig at phobos:~/data-pb$ last -x
svig     pts/1        gaia             Fri Apr 16 18:39   still logged in   
svig     pts/1        gaia             Thu Apr 15 18:14 - 18:14  (00:00)    
svig     pts/1        gaia             Tue Apr 13 19:31 - 18:14 (1+22:42)   
svig     pts/0        gaia             Tue Apr 13 19:27 - 23:03  (03:36)    
root     tty1                          Tue Apr 13 19:25   still logged in   
root     tty1                          Tue Apr 13 19:25 - 19:25  (00:00)    
runlevel (to lvl 2)   2.6.26-2-686     Tue Apr 13 19:25 - 19:29 (3+00:04)   
reboot   system boot  2.6.26-2-686     Tue Apr 13 19:25 - 19:29 (3+00:04)   
shutdown system down  2.6.26-2-686     Tue Apr 13 19:22 - 19:25  (00:02)    
runlevel (to lvl 0)   2.6.26-2-686     Tue Apr 13 19:22 - 19:22  (00:00)    
root     tty6                          Tue Apr 13 19:22 - down   (00:00)    
root     tty6                          Tue Apr 13 19:22 - 19:22  (00:00)    
svig     pts/1        gaia             Tue Apr 13 18:56 - down   (00:25)    
svig     pts/1        gaia             Tue Apr 13 18:50 - 18:56  (00:06)    
svig     pts/1        gaia             Mon Apr 12 22:00 - 18:43  (20:43)    
svig     pts/1        gaia             Mon Apr 12 19:21 - 19:38  (00:16)    
svig     pts/4        gaia             Sun Apr 11 12:05 - 12:05  (00:00)    
svig     pts/4        eeepc            Wed Apr  7 19:32 - 21:44  (02:12)    
[...]

wtmp begins Thu Apr  1 18:47:02 2010

After that, on next day, we see:

Apr 13 06:28:58 phobos syslogd 1.5.0#5: restart.
Apr 13 08:00:02 phobos kernel: [5814019.091249] sd 2:0:0:0: [sdd] Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_RETRY
Apr 13 08:00:02 phobos kernel: [5814019.091272] end_request: I/O error, dev 
sdd, sector 115351425

So another error was detected on /dev/sdd, another USB drive (which is
not on the same USB card/controller by the way). As expected, mdadm reacted:

Apr 13 08:00:02 phobos kernel: [5814019.091283] raid10: Disk failure on sdd1, 
disabling device.
Apr 13 08:00:02 phobos kernel: [5814019.091285] raid10: Operation continuing 
on 2 devices.

But for some unknown reasons, it starts a strange behaviour:

Apr 13 08:00:02 phobos kernel: [5814019.110225] md: recovery of RAID array md2
Apr 13 08:00:02 phobos kernel: [5814019.110250] md: minimum _guaranteed_  
speed: 1000 KB/sec/disk.
Apr 13 08:00:02 phobos kernel: [5814019.110265] md: using maximum available 
idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Apr 13 08:00:02 phobos kernel: [5814019.110293] md: using 128k window, over a 
total of 312568576 blocks.
Apr 13 08:00:02 phobos kernel: [5814019.110308] md: resuming recovery of md2 
from checkpoint.
Apr 13 08:00:02 phobos kernel: [5814019.110323] md: md2: recovery done.

Why did mdadm tried a recovery procedure with 2 devices failed out of
4? I suppose the reason is the (wrongly) set spare drive that
previously failed: /dev/sdf1. The recovery was quite fast:
<1ms (thanks to what? checkpoint? internal bitmap?).

Then, while I expected /dev/sdd1 to be removed from the RAID10
array, another error occured on that same drive, next no the next
(logical?) sector...

Apr 13 08:00:02 phobos kernel: [5814019.133498] sd 2:0:0:0: [sdd] Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK,SUGGEST_RETRY
Apr 13 08:00:02 phobos kernel: [5814019.133533] end_request: I/O error, dev 
sdd, sector 115351428

Finally, the filesystem started to complain. Something I am not
expecting with a 'recovery done' message:

Apr 13 08:00:02 phobos kernel: [5814019.133842] I/O error in filesystem 
("dm-7") meta-data dev dm-7 block 0x1403d63       ("xlog_iodone") error 5 buf 
count 32768
Apr 13 08:00:02 phobos kernel: [5814019.133876] xfs_force_shutdown(dm-7,0x2) 
called from line 1026 of file fs/xfs/xfs_log.c.  Return address = 0xf8a351e2
Apr 13 08:00:02 phobos kernel: [5814019.133942] Filesystem "dm-7": Log I/O 
Error Detected.  Shutting down filesystem: dm-7
Apr 13 08:00:02 phobos kernel: [5814019.133966] Please umount the filesystem, 
and rectify the problem(s)
Apr 13 08:00:02 phobos kernel: [5814019.136669] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.136690]  --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.136704]  disk 0, wo:1, o:0, dev:sdd1
Apr 13 08:00:02 phobos kernel: [5814019.136718]  disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.136731]  disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.139509] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.139529]  --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.139542]  disk 0, wo:1, o:0, dev:sdd1
Apr 13 08:00:02 phobos kernel: [5814019.139556]  disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.139569]  disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.140077] xfs_force_shutdown(dm-7,0x2) 
called from line 789 of file fs/xfs/xfs_log.c.  Return address = 0xf8a36400
Apr 13 08:00:02 phobos kernel: [5814019.140376] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.140394]  --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.140408]  disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.140421]  disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos last message repeated 12 times
Apr 13 08:00:02 phobos last message repeated 7 times

And for some unknown reasons (probably same as previous one), the
new failed drive became a spare:

Apr 13 08:00:02 phobos mdadm[3157]: Fail event detected on md device /dev/md2, 
component device /dev/sdd1
Apr 13 08:00:02 phobos mdadm[3157]: SpareActive event detected on md device 
/dev/md2, component device /dev/sdd1
Apr 13 08:00:02 phobos last message repeated 7 times
[...many times that messages..]
Apr 13 08:00:04 phobos kernel: [5814021.301124] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 08:00:05 phobos last message repeated 25 times
[...many times that messages..]
Apr 13 08:00:07 phobos kernel: [5814024.288015] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
[...many, really many times that messages..]
Apr 13 18:47:37 phobos kernel: [5852873.772006] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 0x6b9e8       ("xfs_trans_read_buf") error 5 
buf count 4096
Apr 13 18:47:37 phobos last message repeated 20 times
Apr 13 18:47:37 phobos kernel: [5852873.799028] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 0x6b9e8       ("xfs_trans_read_buf") error 5 
buf count 4096
[...many, really many times that messages..]
Apr 13 18:49:22 phobos kernel: [5852979.352288] xfs_imap_to_bp: 
xfs_trans_read_buf()returned an error 5 on dm-6.  Returning error.
Apr 13 18:49:22 phobos kernel: [5852979.352288] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 0x165a80       ("xfs_trans_read_buf") error 
5 buf count 8192
Apr 13 18:49:22 phobos kernel: [5852979.352288] xfs_imap_to_bp: 
xfs_trans_read_buf()returned an error 5 on dm-6.  Returning error.
[...many, really many times a mixture of last messages..]
Apr 13 18:49:49 phobos kernel: [5853006.077058] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 0x265180       ("xfs_trans_read_buf") error 
5 buf count 4096
Apr 13 18:49:55 phobos kernel: [5853012.316018] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 18:50:01 phobos kernel: [5853017.645273] Device dm-6, XFS metadata 
write error block 0x63ff3d8 in dm-6
Apr 13 18:50:07 phobos kernel: [5853024.271924] Device dm-6, XFS metadata 
write error block 0x63ff3d8 in dm-6
Apr 13 18:50:25 phobos kernel: [5853042.271139] Device dm-6, XFS metadata 
write error block 0x63ff3d8 in dm-6
Apr 13 18:50:31 phobos kernel: [5853048.316017] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 18:50:43 phobos kernel: [5853060.269739] Device dm-6, XFS metadata 
write error block 0x63ff3d8 in dm-6
Apr 13 18:51:01 phobos kernel: [5853078.270203] Device dm-6, XFS metadata 
write error block 0x63ff3d8 in dm-6
[...many, really many times a mixture of last messages..]
Apr 13 18:56:15 phobos kernel: [5853391.956055] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 0x777d8       ("xfs_trans_read_buf") error 5 
buf count 4096
[...many, really many times a mixture of last messages..]
Apr 13 18:59:31 phobos kernel: [5853588.316018] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 18:59:32 phobos kernel: [5853588.483447] Buffer I/O error on device 
md2, logical block 0
Apr 13 18:59:32 phobos kernel: [5853588.483471] Buffer I/O error on device 
md2, logical block 0
Apr 13 18:59:39 phobos kernel: [5853596.062375] Buffer I/O error on device 
md2, logical block 0
Apr 13 18:59:39 phobos kernel: [5853596.062399] Buffer I/O error on device 
md2, logical block 0
[...]

Since then, I rebooted. And after few attempts to remove/add those
"failed drives" (actually, they are quite new, I suspect the devices
are fine, something else bad happened, perhaps because of USB stuff),
I end up with the following:

phobos:~# cat /proc/mdstat
Personalities : [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : inactive sdd1[2] sdc1[5](S) sdf1[4](S) sde1[3]
      1250274304 blocks

unused devices: <none>
phobos:~#

Note that previous devices named 'sdd1' and sdf1' are now 'sdc' and
'sdf' (udev and USB?). Anyway, those drives are still labeled as
spares:

phobos:~# mdadm -Q --detail /dev/md2
/dev/md2:
        Version : 00.90
  Creation Time : Thu Aug  6 01:59:44 2009
     Raid Level : raid10
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Tue Apr 13 19:22:21 2010
          State : active, degraded, Not Started
 Active Devices : 2
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 2

         Layout : near=2, far=1
     Chunk Size : 64K

           UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
         Events : 0.90612

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       0        0        1      removed
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1

       4       8       81        -      spare   /dev/sdf1
       5       8       33        -      spare   /dev/sdc1
phobos:~#

I tried some operations on the array such as switch-off/add/re-add/remove:
phobos:~# history 
  [..output has been filtered...]
  162  dmesg
  163  cat /var/log/messages
  164  dmesg
  165  cat /proc/mdstat 
  166  more /var/log/messages
  167  more /var/log/messages
  168  df -i
  169  df
  170  cat /proc/mdstat 
  171  dmesg
  [...]
  175  halt -p
  176  /etc/init.d/nfs-kernel-server stop
  [...]
  185  mdadm --misc -Q --detail /dev/md2 
  [...]
  193  pvs
  206  umount /data/
  212  mdadm --manage --remove /dev/sdd1
  213  mdadm --manage --remove /dev/md2 /dev/sdd1
  215  mdadm --manage --remove /dev/md2 /dev/sdf1
  217  mdadm --manage --add /dev/md2 /dev/sdd1 /dev/sdf1
  221  dmesg
  222  cat /var/log/messages
  223  dmsetup --help
  224  cat /proc/mdstat 
  225  mdadm --detail /dev/md2
  226  man mdadm
  227  cat /proc/mdstat 
  [...]
  263  mdadm -Q -E /dev/md2
  264  mdadm -S /dev/md2
  265  cat /proc/mdstat 
  266  dmesg
  267  cat /proc/mdstat 
  [...]
  322  /etc/init.d/nfs-kernel-server stop
  323  mdadm -v --detail /dev/md2
  324  mdadm -A -v /dev/md2
  326  mdadm -v -E --detail /dev/md2
  329  mdadm -vvv -E  /dev/md2
  330  mdadm -D  /dev/md2
  331  cat /proc/mdstat 
  349  mdadm --assemble -u b34f4192:f823df58:24bf28c1:396de87f 
  350  mdadm --assemble -u b34f4192:f823df58:24bf28c1:396de87f /dev/md2
  355  mdadm -E /dev/md2
  356  mdadm -E -D -Q /dev/md2
  357  mdadm  -D -Q /dev/md2
  358  mdadm  -D -X -Q /dev/md2
  359  mdadm  -X -Q /dev/md2
  360  mdadm  -X  /dev/md2
  361  cat /var/run/mdadm/map 
  362  mdadm --stop /dev/md2
  363  mdadm -A --scan /dev/md2
  364  cat /proc/mdstat 
  365  mdadm --stop /dev/md2
  366  aptitude
  367  cat /etc/mdadm/mdadm.conf
  372  cat /sys/block/dm-2/dev 
  373  cat /sys/block/md2/dev 
  374  cat /sys/block/md2/removable 
  375  cat /sys/block/md2/size 
  376  cat /sys/block/md2/capability 
  377  cat /sys/block/md2/power/
  378  cat /sys/block/md2/power/wakeup 
  379  cat /sys/block/md2/slaves/sdc1/sta
  380  cat /sys/block/md2/slaves/sdc1/stat 
  381  cat /sys/block/md2/md/raid_disks 
  382  cat /sys/block/md2/md/new_dev 
  383  cat /sys/block/md2/md/metadata_version 
  384  man mdadm
  385  cat /sys/block/md0/md/dev-sda1/errors 
  386  cat /sys/block/md2/md/dev-sdd1/errors 
  387  cat /sys/block/md2/md/dev-sdd1/state 
  388  cat /sys/block/md2/md/dev-sdd1/*
  389  cat /sys/block/md2/md/dev-sd?1/*
  390  mdadm -D /dev/md2
  [...]
  400  cat /proc/mdstat 
  401  mdadm -A --update=summaries /dev/md2 /dev/sdc1 /dev/sdd1 /dev/sde1 
/dev/sdf1
  402  cat /proc/mdstat 
  403  dmesg
  404  mdadm --stop /dev/md2
  405  mdadm -vvv -A --update=summaries /dev/md2 /dev/sdc1 /dev/sdd1 /dev/sde1 
/dev/sdf1
  406  cat /proc/mdstat 
  407  grep FailSpare /var/log/*
  408  grep Spare /var/log/*
  409  mdadm -vvv -A --update=summaries --run /dev/md2 /dev/sdc1 /dev/sdd1 
/dev/sde1 /dev/sdf1
  410  dmesg
  411  cat /proc/mdstat 
  412  mdadm -A /dev/md2 -f /dev/sdc1 -r /dev/sdc1
  413  mdadm /dev/md2 -f /dev/sdc1 -r /dev/sdc1
  414  ls -al /dev/sdc1
  415  man mdadm
  416  mdadm -vv /dev/md2 --fail /dev/sdc1
  417  cat /proc/mdstat 
  424  mdadm -vv /dev/md2 --fail /dev/sdf1
  [...]
  464  cat /proc/mdstat 
  465  mdadm -v -D  /dev/md2 
  466  cat /proc/mdstat 
  467  mdadm -E -v  /dev/md2 
  468  cat /proc/mdstat 
  469  mdadm -v -D  /dev/md2 
  470  mdadm /dev/md2 --add /dev/sdf1
  471  mdadm /dev/md2 --add /dev/sdc1
  472  mdadm /dev/md2 -r /dev/sdf1
  473  mdadm /dev/md2 -f /dev/sdf1

I got some strange error, such as 'device not found for /dev/sdc1 for
example'. Last syslog entries may finding out what happened on those
actions:

Apr 13 19:25:15 phobos mdadm[3145]: DeviceDisappeared event detected on md 
device /dev/md2
Apr 13 19:25:18 phobos kernel: [   34.800007] eth1: no IPv6 routers present
Apr 13 19:28:14 phobos kernel: [  210.449173] md: md2 stopped.
Apr 13 19:28:14 phobos kernel: [  210.449227] md: unbind<sdd1>
Apr 13 19:28:14 phobos kernel: [  210.449271] md: export_rdev(sdd1)
Apr 13 19:28:14 phobos kernel: [  210.449358] md: unbind<sdc1>
Apr 13 19:28:14 phobos kernel: [  210.449395] md: export_rdev(sdc1)
Apr 13 19:28:14 phobos kernel: [  210.449450] md: unbind<sdf1>
Apr 13 19:28:14 phobos kernel: [  210.449487] md: export_rdev(sdf1)
Apr 13 19:28:14 phobos kernel: [  210.449542] md: unbind<sde1>
Apr 13 19:28:14 phobos kernel: [  210.449578] md: export_rdev(sde1)
Apr 13 19:38:44 phobos kernel: [  840.840946] md: md2 stopped.
Apr 13 19:38:44 phobos kernel: [  840.925279] md: bind<sde1>
Apr 13 19:38:44 phobos kernel: [  840.925910] md: bind<sdf1>
Apr 13 19:38:44 phobos kernel: [  840.926630] md: bind<sdc1>
Apr 13 19:38:44 phobos kernel: [  840.927380] md: bind<sdd1>
Apr 13 19:42:35 phobos kernel: [ 1071.388437] md: md2 stopped.
Apr 13 19:42:35 phobos kernel: [ 1071.388491] md: unbind<sdd1>
Apr 13 19:42:35 phobos kernel: [ 1071.388535] md: export_rdev(sdd1)
Apr 13 19:42:35 phobos kernel: [ 1071.388579] md: unbind<sdc1>
Apr 13 19:42:35 phobos kernel: [ 1071.388615] md: export_rdev(sdc1)
Apr 13 19:42:35 phobos kernel: [ 1071.388654] md: unbind<sdf1>
Apr 13 19:42:35 phobos kernel: [ 1071.388690] md: export_rdev(sdf1)
Apr 13 19:42:35 phobos kernel: [ 1071.388729] md: unbind<sde1>
Apr 13 19:42:35 phobos kernel: [ 1071.388765] md: export_rdev(sde1)
Apr 13 19:42:47 phobos kernel: [ 1083.401114] md: md2 stopped.
Apr 13 19:42:47 phobos kernel: [ 1083.484335] md: bind<sde1>
Apr 13 19:42:47 phobos kernel: [ 1083.484840] md: bind<sdf1>
Apr 13 19:42:47 phobos kernel: [ 1083.485510] md: bind<sdc1>
Apr 13 19:42:47 phobos kernel: [ 1083.486259] md: bind<sdd1>
Apr 13 19:43:07 phobos kernel: [ 1103.269348] md: md2 stopped.
Apr 13 19:43:07 phobos kernel: [ 1103.269402] md: unbind<sdd1>
Apr 13 19:43:07 phobos kernel: [ 1103.269448] md: export_rdev(sdd1)
Apr 13 19:43:07 phobos kernel: [ 1103.269491] md: unbind<sdc1>
Apr 13 19:43:07 phobos kernel: [ 1103.269527] md: export_rdev(sdc1)
Apr 13 19:43:07 phobos kernel: [ 1103.269566] md: unbind<sdf1>
Apr 13 19:43:07 phobos kernel: [ 1103.269602] md: export_rdev(sdf1)
Apr 13 19:43:07 phobos kernel: [ 1103.269641] md: unbind<sde1>
Apr 13 19:43:07 phobos kernel: [ 1103.269676] md: export_rdev(sde1)
Apr 13 22:38:33 phobos kernel: [11629.789139] md: md2 stopped.
Apr 13 22:38:33 phobos kernel: [11629.833527] md: bind<sde1>
Apr 13 22:38:33 phobos kernel: [11629.834291] md: bind<sdf1>
Apr 13 22:38:33 phobos kernel: [11629.835466] md: bind<sdc1>
Apr 13 22:38:33 phobos kernel: [11629.836353] md: bind<sdd1>
Apr 13 23:20:55 phobos kernel: [14171.784713] md: md2 stopped.
Apr 13 23:20:55 phobos kernel: [14171.784749] md: unbind<sdd1>
Apr 13 23:20:55 phobos kernel: [14171.784773] md: export_rdev(sdd1)
Apr 13 23:20:55 phobos kernel: [14171.784796] md: unbind<sdc1>
Apr 13 23:20:55 phobos kernel: [14171.784812] md: export_rdev(sdc1)
Apr 13 23:20:55 phobos kernel: [14171.784831] md: unbind<sdf1>
Apr 13 23:20:55 phobos kernel: [14171.784846] md: export_rdev(sdf1)
Apr 13 23:20:55 phobos kernel: [14171.784864] md: unbind<sde1>
Apr 13 23:20:55 phobos kernel: [14171.784879] md: export_rdev(sde1)
Apr 13 23:22:19 phobos kernel: [14255.160618] md: md2 stopped.
Apr 13 23:22:19 phobos kernel: [14255.192472] md: bind<sde1>
Apr 13 23:22:19 phobos kernel: [14255.192978] md: bind<sdf1>
Apr 13 23:22:19 phobos kernel: [14255.193545] md: bind<sdc1>
Apr 13 23:22:19 phobos kernel: [14255.194293] md: bind<sdd1>
Apr 13 23:22:54 phobos kernel: [14290.304205] md: md2 stopped.
Apr 13 23:22:54 phobos kernel: [14290.304235] md: unbind<sdd1>
Apr 13 23:22:54 phobos kernel: [14290.304256] md: export_rdev(sdd1)
Apr 13 23:22:54 phobos kernel: [14290.304278] md: unbind<sdc1>
Apr 13 23:22:54 phobos kernel: [14290.304290] md: export_rdev(sdc1)
Apr 13 23:22:54 phobos kernel: [14290.304307] md: unbind<sdf1>
Apr 13 23:22:54 phobos kernel: [14290.304320] md: export_rdev(sdf1)
Apr 13 23:22:54 phobos kernel: [14290.304337] md: unbind<sde1>
Apr 13 23:22:54 phobos kernel: [14290.304349] md: export_rdev(sde1)
Apr 13 23:23:00 phobos kernel: [14296.392962] md: md2 stopped.
Apr 13 23:23:00 phobos kernel: [14296.427224] md: bind<sde1>
Apr 13 23:23:00 phobos kernel: [14296.427989] md: bind<sdf1>
Apr 13 23:23:00 phobos kernel: [14296.428700] md: bind<sdc1>
Apr 13 23:23:00 phobos kernel: [14296.429562] md: bind<sdd1>
Apr 13 23:31:02 phobos kernel: [14778.314244] md: md2 stopped.
Apr 13 23:31:02 phobos kernel: [14778.314275] md: unbind<sdd1>
Apr 13 23:31:02 phobos kernel: [14778.314297] md: export_rdev(sdd1)
Apr 13 23:31:02 phobos kernel: [14778.314318] md: unbind<sdc1>
Apr 13 23:31:02 phobos kernel: [14778.314331] md: export_rdev(sdc1)
Apr 13 23:31:02 phobos kernel: [14778.314347] md: unbind<sdf1>
Apr 13 23:31:02 phobos kernel: [14778.314360] md: export_rdev(sdf1)
Apr 13 23:31:02 phobos kernel: [14778.314377] md: unbind<sde1>
Apr 13 23:31:02 phobos kernel: [14778.314390] md: export_rdev(sde1)
Apr 13 23:31:02 phobos kernel: [14778.356247] md: bind<sde1>
Apr 13 23:31:02 phobos kernel: [14778.356817] md: bind<sdf1>
Apr 13 23:31:02 phobos kernel: [14778.357499] md: bind<sdc1>
Apr 13 23:31:02 phobos kernel: [14778.358374] md: bind<sdd1>
Apr 13 23:31:02 phobos kernel: [14778.420706] raid10: not enough operational 
mirrors for md2
Apr 13 23:31:02 phobos kernel: [14778.420753] md: pers->run() failed ...

I hope this (long) bug report will help you finding out what happened,
and to correct or help in the correction of that bug.  Obviously, I
also hope (very hard) my data can be recovered. But I am not sure of
the actions I should take for that. Therefore, I am waiting for your
inputs.

Best Regards.
Pierre.  

PS: Of course, if I miss or misunderstood anything, just tell me.

-- Package-specific info:

-- System Information:
Debian Release: 5.0.4
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-2-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages mdadm depends on:
ii  debconf                   1.5.24         Debian configuration management 
sy
ii  libc6                     2.7-18lenny2   GNU C Library: Shared libraries
ii  lsb-base                  3.2-20         Linux Standard Base 3.2 init 
scrip
ii  makedev                   2.3.1-88       creates device files in /dev
ii  udev                      0.125-7+lenny3 /dev/ and hotplug management 
daemo

Versions of packages mdadm recommends:
ii  module-init-tools             3.4-1      tools for managing Linux kernel 
mo
ii  ssmtp [mail-transport-agent]  2.62-3     extremely simple MTA to get mail 
o

mdadm suggests no packages.

-- debconf information:
  mdadm/autostart: true
  mdadm/mail_to: root
  mdadm/initrdstart_msg_errmd:
  mdadm/initrdstart: all
  mdadm/initrdstart_msg_errconf:
  mdadm/initrdstart_notinconf: false
  mdadm/initrdstart_msg_errexist:
  mdadm/initrdstart_msg_intro:
  mdadm/autocheck: true
  mdadm/initrdstart_msg_errblock:
  mdadm/start_daemon: true

-- 
Pierre Vignéras





More information about the pkg-mdadm-devel mailing list