Bug#578352: closed by martin f krafft <madduck at debian.org> (Re: Bug#578352: mdadm: failed devices become spares!)

Pierre Vignéras pierre at vigneras.name
Tue Apr 27 18:17:36 UTC 2010


On mardi 20 avril 2010, you wrote:
> also sprach Pierre Vignéras <pierre at vigneras.name> [2010.04.20.1317 +0200]:
> > Apr 12 20:10:02 phobos mdadm[3157]: Fail event detected on md device
> > /dev/md2, component device /dev/sdf1
> > Apr 12 20:11:02 phobos mdadm[3157]: SpareActive event detected on md
> > device /dev/md2, component device /dev/sdf1
> >
> > And at that time I was neither logged in nor did I touch to
> > that NFS server (neither the USB drives, nor the server itself).
> > Actually, I discovered the problem the day after. So the first question
> > is:
> >
> > is it normal that after a failure detected on /dev/sdf1 it becomes
> > a spare (again if I understand the syslog message correctly)?
> 
> It seems like the drive goes offline and comes back, and then
> I think it would be normal. Are there no kernel messages about this?

Well, here is the content of my kern.log:

Apr 12 19:22:44 phobos kernel: [5768580.538554] ip_tables: (C) 2000-2006 
Netfilter Core Team
Apr 12 20:10:02 phobos kernel: [5771419.310123] sd 5:0:0:0: [sdf] Result: 
hostbyte=DID_ERROR 
driverbyte=DRIVER_OK,S
UGGEST_RETRY
Apr 12 20:10:02 phobos kernel: [5771419.310144] end_request: I/O error, dev 
sdf, sector 115347706
Apr 12 20:10:02 phobos kernel: [5771419.310156] raid10: Disk failure on sdf1, 
disabling device.
Apr 12 20:10:02 phobos kernel: [5771419.310158] raid10: Operation continuing 
on 3 devices.
Apr 12 20:10:02 phobos kernel: [5771419.323466] RAID10 conf printout:
Apr 12 20:10:02 phobos kernel: [5771419.323480]  --- wd:3 rd:4
Apr 12 20:10:02 phobos kernel: [5771419.323488]  disk 0, wo:0, o:1, dev:sdd1
Apr 12 20:10:02 phobos kernel: [5771419.323495]  disk 1, wo:1, o:0, dev:sdf1
Apr 12 20:10:02 phobos kernel: [5771419.323501]  disk 2, wo:0, o:1, dev:sdc1
Apr 12 20:10:02 phobos kernel: [5771419.323508]  disk 3, wo:0, o:1, dev:sde1
Apr 12 20:10:02 phobos kernel: [5771419.323801] RAID10 conf printout:
Apr 12 20:10:02 phobos kernel: [5771419.323813]  --- wd:3 rd:4
Apr 12 20:10:02 phobos kernel: [5771419.323820]  disk 0, wo:0, o:1, dev:sdd1
Apr 12 20:10:02 phobos kernel: [5771419.323826]  disk 2, wo:0, o:1, dev:sdc1
Apr 12 20:10:02 phobos kernel: [5771419.323833]  disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.091249] sd 2:0:0:0: [sdd] Result: 
hostbyte=DID_ERROR 
driverbyte=DRIVER_OK,S
UGGEST_RETRY
Apr 13 08:00:02 phobos kernel: [5814019.091272] end_request: I/O error, dev 
sdd, sector 115351425
Apr 13 08:00:02 phobos kernel: [5814019.091283] raid10: Disk failure on sdd1, 
disabling device.
Apr 13 08:00:02 phobos kernel: [5814019.091285] raid10: Operation continuing 
on 2 devices.
Apr 13 08:00:02 phobos kernel: [5814019.110225] md: recovery of RAID array md2
Apr 13 08:00:02 phobos kernel: [5814019.110250] md: minimum _guaranteed_  
speed: 1000 KB/sec/disk.
Apr 13 08:00:02 phobos kernel: [5814019.110265] md: using maximum available 
idle IO bandwidth (but not more than 
20
0000 KB/sec) for recovery.
Apr 13 08:00:02 phobos kernel: [5814019.110293] md: using 128k window, over a 
total of 312568576 blocks.
Apr 13 08:00:02 phobos kernel: [5814019.110308] md: resuming recovery of md2 
from checkpoint.
Apr 13 08:00:02 phobos kernel: [5814019.110323] md: md2: recovery done.
Apr 13 08:00:02 phobos kernel: [5814019.133498] sd 2:0:0:0: [sdd] Result: 
hostbyte=DID_ERROR 
driverbyte=DRIVER_OK,S
UGGEST_RETRY
Apr 13 08:00:02 phobos kernel: [5814019.133533] end_request: I/O error, dev 
sdd, sector 115351428
Apr 13 08:00:02 phobos kernel: [5814019.133842] I/O error in filesystem 
("dm-7") meta-data dev dm-7 block 
0x1403d63
       ("xlog_iodone") error 5 buf count 32768
Apr 13 08:00:02 phobos kernel: [5814019.133876] xfs_force_shutdown(dm-7,0x2) 
called from line 1026 of file 
fs/xfs/x
fs_log.c.  Return address = 0xf8a351e2
Apr 13 08:00:02 phobos kernel: [5814019.133942] Filesystem "dm-7": Log I/O 
Error Detected.  Shutting down 
filesyste
m: dm-7
Apr 13 08:00:02 phobos kernel: [5814019.133966] Please umount the filesystem, 
and rectify the problem(s)
Apr 13 08:00:02 phobos kernel: [5814019.136669] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.136690]  --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.136704]  disk 0, wo:1, o:0, dev:sdd1
Apr 13 08:00:02 phobos kernel: [5814019.136718]  disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.136731]  disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.139509] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.139529]  --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.139542]  disk 0, wo:1, o:0, dev:sdd1
Apr 13 08:00:02 phobos kernel: [5814019.139556]  disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.139569]  disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.140077] xfs_force_shutdown(dm-7,0x2) 
called from line 789 of file 
fs/xfs/xfs_log.c.  Return address = 0xf8a36400
Apr 13 08:00:02 phobos kernel: [5814019.140376] RAID10 conf printout:
Apr 13 08:00:02 phobos kernel: [5814019.140394]  --- wd:2 rd:4
Apr 13 08:00:02 phobos kernel: [5814019.140408]  disk 2, wo:0, o:1, dev:sdc1
Apr 13 08:00:02 phobos kernel: [5814019.140421]  disk 3, wo:0, o:1, dev:sde1
Apr 13 08:00:02 phobos kernel: [5814019.143330] nfsd: non-standard errno: 5
Apr 13 08:00:02 phobos kernel: [5814019.143806] nfsd: non-standard errno: 5
Apr 13 08:00:02 phobos kernel: [5814019.144412] nfsd: non-standard errno: 5
[...and so on...]
Apr 13 08:00:03 phobos kernel: [5814019.732063] nfsd: non-standard errno: 5
Apr 13 08:00:04 phobos kernel: [5814021.301124] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 08:00:05 phobos kernel: [5814021.653520] nfsd: non-standard errno: 5
Apr 13 08:00:05 phobos last message repeated 25 times
Apr 13 08:00:05 phobos kernel: [5814021.653521] nfsd: non-standard errno: 5
Apr 13 08:00:05 phobos last message repeated 27 times
[...and so on...]
Apr 13 08:00:05 phobos kernel: [5814021.680261] nfsd: non-standard errno: 5
Apr 13 08:00:07 phobos kernel: [5814024.288015] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 08:00:43 phobos kernel: [5814060.296016] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 08:01:19 phobos kernel: [5814096.296013] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 08:01:55 phobos kernel: [5814132.296014] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 08:02:31 phobos kernel: [5814168.296015] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
[...and so on...]
Apr 13 18:47:31 phobos kernel: [5852868.316015] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 18:47:37 phobos kernel: [5852873.772006] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 
0x6b9e8       ("xfs_trans_read_buf") error 5 buf count 4096
Apr 13 18:47:37 phobos last message repeated 20 times
Apr 13 18:47:37 phobos kernel: [5852873.799028] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 
0x6b9e8       ("xfs_trans_read_buf") error 5 buf count 4096
[...and so on...]
Apr 13 18:49:22 phobos kernel: [5852979.352288] I/O error in filesystem 
("dm-6") meta-data dev dm-6 block 
0x165a80       ("xfs_trans_read_buf") error 5 buf count 8192
Apr 13 18:49:22 phobos kernel: [5852979.352288] xfs_imap_to_bp: 
xfs_trans_read_buf()returned an error 5 on dm-6.  
Returning error.
[...and so on...]
Apr 13 19:22:12 phobos kernel: [5854948.560651] Filesystem "dm-7": 
xfs_log_force: error 5 returned.
Apr 13 19:22:12 phobos kernel: [5854948.560686] xfs_force_shutdown(dm-7,0x1) 
called from line 420 of file 
fs/xfs/xfs_rw.c.  Return address = 0xf8a48ce7
Apr 13 19:22:15 phobos kernel: [5854951.619319] Device dm-6, XFS metadata 
write error block 0x63ff3d8 in dm-6

Then I rebooted...

Apr 13 19:22:15 phobos kernel: Kernel logging (proc) stopped.
Apr 13 19:22:15 phobos kernel: Kernel log daemon terminating
Apr 13 19:25:07 phobos kernel: klogd 1.5.0#5, log source = /proc/kmsg started.
Apr 13 19:25:07 phobos kernel: [    0.000000] Initializing cgroup subsys 
cpuset
Apr 13 19:25:07 phobos kernel: [    0.000000] Initializing cgroup subsys cpu
Apr 13 19:25:07 phobos kernel: [    0.000000] Linux version 2.6.26-2-686 
(Debian 2.6.26-21lenny4) 
(dannf at debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) 
#1 SMP Tue Mar 9 17:35:51 UTC 
2010
Apr 13 19:25:07 phobos kernel: [    0.000000] BIOS-provided physical RAM map:

...

> > what should I do to recover my data? You suggest to remove
> > previous one. I don't get how:
> >
> > mdadm /dev/md2 --remove ?? (according to /proc/mdstat, /dev/sdf1
> > and /dev/sdc1 are now spares).
> 
> Try
> 
>   mdadm --remove /dev/md2 /dev/sdc1
>   mdadm --add /dev/md2 /dev/sdc1
>   mdadm --remove /dev/md2 /dev/sdf1
>   mdadm --add /dev/md2 /dev/sdf1
> 
> and post all the output.

Ok. Here is the result:

phobos:/var/log# mdadm -Q --detail /dev/md2
/dev/md2:
        Version : 00.90
  Creation Time : Thu Aug  6 01:59:44 2009
     Raid Level : raid10
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Tue Apr 13 19:22:21 2010
          State : active, degraded, Not Started
 Active Devices : 2
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 2

         Layout : near=2, far=1
     Chunk Size : 64K

           UUID : b34f4192:f823df58:24bf28c1:396de87f (local to host phobos)
         Events : 0.90612

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       0        0        1      removed
       2       8       49        2      active sync   /dev/sdd1
       3       8       65        3      active sync   /dev/sde1

       4       8       81        -      spare   /dev/sdf1
       5       8       33        -      spare   /dev/sdc1
phobos:/var/log# mdadm --remove /dev/md2 /dev/sdc1
mdadm: hot remove failed for /dev/sdc1: No such device
phobos:/var/log#

Strange isn't it?

phobos:/var/log# ls -l /dev/sdc1
brw-rw---- 1 root floppy 8, 33 2010-04-13 23:31 /dev/sdc1
phobos:/var/log#

Well, it seems strange to me that the group is 'floppy', 
but it is the same for all USB drives. So I guess it's fine.

phobos:/var/log# ls -l /dev/sd*
brw-rw---- 1 root disk   8,  0 2010-04-13 19:24 /dev/sda
brw-rw---- 1 root disk   8,  1 2010-04-13 19:24 /dev/sda1
brw-rw---- 1 root disk   8,  2 2010-04-13 19:24 /dev/sda2
brw-rw---- 1 root disk   8, 16 2010-04-13 19:24 /dev/sdb
brw-rw---- 1 root disk   8, 17 2010-04-13 19:24 /dev/sdb1
brw-rw---- 1 root disk   8, 18 2010-04-13 19:24 /dev/sdb2
brw-rw---- 1 root disk   8, 19 2010-04-13 19:24 /dev/sdb3
brw-rw---- 1 root floppy 8, 32 2010-04-13 19:24 /dev/sdc
brw-rw---- 1 root floppy 8, 33 2010-04-13 23:31 /dev/sdc1
brw-rw---- 1 root floppy 8, 34 2010-04-13 19:24 /dev/sdc2
brw-rw---- 1 root floppy 8, 35 2010-04-13 19:24 /dev/sdc3
brw-rw---- 1 root floppy 8, 48 2010-04-13 19:24 /dev/sdd
brw-rw---- 1 root floppy 8, 49 2010-04-13 23:31 /dev/sdd1
brw-rw---- 1 root floppy 8, 64 2010-04-13 19:24 /dev/sde
brw-rw---- 1 root floppy 8, 65 2010-04-13 23:31 /dev/sde1
brw-rw---- 1 root floppy 8, 66 2010-04-13 19:24 /dev/sde2
brw-rw---- 1 root floppy 8, 67 2010-04-13 19:24 /dev/sde3
brw-rw---- 1 root floppy 8, 80 2010-04-13 19:24 /dev/sdf
brw-rw---- 1 root floppy 8, 81 2010-04-13 23:31 /dev/sdf1
brw-rw---- 1 root floppy 8, 82 2010-04-13 19:24 /dev/sdf2
phobos:/var/log#

By the way, I can access to that drive:

phobos:/var/log# strings  /dev/sdc1|head
LABELONE
LVM2 001oQcXx95Eazcja3nCnP2owexSjLk1sFiZ
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTz
 LVM2 x[5A%r0N*>
rs.RW.1 {
id = "11SAPG-98i8-6zl2-3xtU-1W5d-0mmA-0N30vU"
seqno = 1
status = ["RESIZEABLE", "READ", "WRITE"]
extent_size = 8192
max_lv = 0
phobos:/var/log#

So, I am lost...

Thanks for your time.
Regards.
-- 
Pierre Vignéras


More information about the pkg-mdadm-devel mailing list