Bug#610184: Some debugging of superfluous RAID member

Anthony DeRobertis aderobertis at metrics.net
Mon Jan 17 22:01:06 UTC 2011


First off, DEB_BUILD_OPTIONS nostrip,noopt does not work right.
Grub-probe is never build with -g -O0. Hacked up ./debian/rules a little
to force it (added CFLAGS='-g -O0' to confflags). Also, removed most of
the packages from debian/control so I wouldn't have to wait through N
flavors of grub building.

I'm comparing grub-probe --target=device / on two machines with very
similar RAID configs (both RAID1 /boot, 4 disks, no spares & RAID10
everything else (via lvm) with 4 disks, no spares).

The difference seems to be than on the machine with the errors,
grub_mdraid_detect on hd0 returns GRUB_ERR_NONE, whereas on the other
machine it returns GRUB_ERR_OUT_OF_RANGE. This is because it found the
version 1.0 superblock actually belonging to /dev/sda2 when looking for
the one for /dev/sda.

I suppose the alignment just doesn't work out on the other machine.

"Gond" is the machine that shows errors. "Zia" is the one that does not.

Zia:

    root at Zia:/home/anthony# sfdisk -d /dev/sda
    # partition table of /dev/sda
    unit: sectors

    /dev/sda1 : start=       63, size=   256977, Id=fd, bootable
    /dev/sda2 : start=   257040, size=   257040, Id= c
    /dev/sda3 : start=   514080, size=1953005985, Id=fd
    /dev/sda4 : start=        0, size=        0, Id= 0

    (gdb) p sector
    $18 = 1953525152
    (gdb) n
    401           if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
    (gdb) n
    405           if (sb_1x.magic == SB_MAGIC)
    (gdb) p sb_1x
    $19 = {magic = 0, major_version = 0, feature_map = 0, pad0 = 0, 
      set_uuid = '\000' <repeats 15 times>, set_name = '\000' <repeats 31 times>, ctime = 0, 
      level = 0, layout = 0, size = 0, chunksize = 0, raid_disks = 0, bitmap_offset = 0, 
      new_level = 0, reshape_position = 0, delta_disks = 0, new_layout = 0, new_chunk = 0, 
      pad1 = "\000\000\000", data_offset = 0, data_size = 0, super_offset = 0, recovery_offset = 0, 
      dev_number = 0, cnt_corrected_read = 0, device_uuid = '\000' <repeats 15 times>, 
      devflags = 0 '\000', pad2 = "\000\000\000\000\000\000", utime = 0, events = 0, 
      resync_offset = 0, sb_csum = 0, max_dev = 0, pad3 = '\000' <repeats 31 times>, 
      dev_roles = 0x7fffffffcc90}
    (gdb) p *disk
    $20 = {name = 0x671090 "hd0", dev = 0x6425c0, total_sectors = 1953525168, has_partitions = 1, 
      id = 0, partition = 0x0, read_hook = 0, data = 0x6763b0}
      

And Gond:

    root at Gond:~# sfdisk -d /dev/sda
    # partition table of /dev/sda
    unit: sectors

    /dev/sda1 : start=     2048, size=   262144, Id=fd, bootable
    /dev/sda2 : start=   264192, size=976508976, Id=fd
    /dev/sda3 : start=        0, size=        0, Id= 0
    /dev/sda4 : start=        0, size=        0, Id= 0


    (gdb) p sector
    $10 = 976773152
    (gdb) n
    401           if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
    (gdb) n
    405           if (sb_1x.magic == SB_MAGIC)
    (gdb) p sb_1x
    $11 = {magic = 2838187772, major_version = 1, feature_map = 1, pad0 = 0, 
      set_uuid = "\265\363\344\263yw\303H\342\"\203\373w\235n\217", 
      set_name = "Gond:1", '\000' <repeats 25 times>, ctime = 1279225379, level = 10, layout = 258, 
      size = 976507904, chunksize = 1024, raid_disks = 4, bitmap_offset = 4294967288, new_level = 0, 
      reshape_position = 0, delta_disks = 0, new_layout = 0, new_chunk = 0, pad1 = "\000\000\000", 
      data_offset = 0, data_size = 976508704, super_offset = 976508960, recovery_offset = 0, 
      dev_number = 0, cnt_corrected_read = 0, device_uuid = "\361\344Wp+\006\245\346(?\307\027[Lj.", 
      devflags = 0 '\000', pad2 = "\000\000\000\000\000\000", utime = 1295300868, events = 36368, 
      resync_offset = 18446744073709551615, sb_csum = 2056702893, max_dev = 384, 
      pad3 = '\000' <repeats 31 times>, dev_roles = 0x7fffffffd840}
    (gdb) p *disk
    $13 = {name = 0x671090 "hd0", dev = 0x6425c0, total_sectors = 976773168, has_partitions = 1, 
      id = 0, partition = 0x0, read_hook = 0, data = 0x6710b0}
      

So, this bug has probably been present for a while, but maybe didn't
matter until the recent error checks?

Somehow there is a way around this, as it doesn't confuse mdadm -E:

    root at Gond:~# mdadm -E /dev/sda
    mdadm: No md superblock detected on /dev/sda.
      

So, it can be fixed... One thing that quickly jumps out at me, is maybe
have grub_mdraid_detect check if sector == sb_1x.super_offset.

I can work up a patch to do that if you'd like.





More information about the Pkg-grub-devel mailing list