Bug#728677: mdadm: has no superblock - assembly aborted (but it has)

Sat Dec 6 07:25:37 UTC 2014

06.12.2014 00:06, Marc Lehmann wrote:
> On Fri, Dec 05, 2014 at 01:23:19PM +0300, Michael Tokarev <mjt at tls.msk.ru> wrote:
>> I'm not sure what to make out of this bugreport really.  Tagging with
>> `moreinfo' for now.
> 
> Not sure what more info I could deliver, it seems to be clear from the
> bugreport as-is. However, the volume is long gone, so I now have the same
> information as you.

Did it happen only once, or was it a persisten prob happening on each
boot?  I mean the unability to assemble the raid device.

>> What I notice however is that ata-HGST_HDS724040ALE640_2 contains a
>> partition table, and mdadm correctly detects this.  So this is a first
>> data device in the array, and actual data starts there.  With superblock
>> format v. 1.0, this is also the start of the drive, ie, the same partition
>> table exists on this drive and on your array.  This is a sure way of
>> making problems, since the system first see your ata drive and tries to
>> parse partition table in there, and might even try to mount some partitions,
>> making the drive busy, so mdadm is unable to open it for assembly.
> 
> Nothing in my system will randomly mount partitions it hasn't explicitly
> been asked to mount.

You don't use systemd, do you?  (Not trying to open a can of worms)

> In any case, if this were true,. it's simply a bug in mdadm - mdadm should
> not say "there is no superblock" when there really is a superblock but
> something simply kept it from opening the device.

Yes, I thought about that.  I don't see such error in the code, and I
sure remember mdadm reporting correct errors when the device it was
asked to use was busy or there was an I/O error in it.  The result,
however, is the same: it can't assemble the array this way, given an
explicit list of drives to assemble it from.  So if anything it is a
cosmetic error, that is, wrong error is being reported, the behavour
is the same.

>> When you create raid5 array, use superblock version 1.1 or 1.2 (1.2 is
>> the default), so no other tools will try to access your component devices
>> outside the array.
> 
> You seem to be confused about the idea behind this bug report: the idea was
> to report an actual bug in the tool, I didn't try to abuse the bug tracker to
> get support. I don't have an actual problem with mdadm, I simply reported
> that mdadm gives a erong error message.

Hence the 'moreinfo'.  I wasn't confused, instead, I just didn't know ;)

>> With this context I'd say it is an operator error.
> 
> Again, I think you misunderstood - I was reporting a bug, not asking for
> support. It might or might not be true that, what I did is not the optimal
> way, or the way it should be done, or the state-of-the-art-technique to do
> things, but that is completely irrelevant, as this report is about a wrong
> message in mdadm.

Now it looks like it is you who's confused.  What I mentioned above
about raid5 and having metadata at the end of the component devices
is true regardless whenever you're asking for help or not (the same
applies to other raid levels as well, except raid1 aka mirror, because
raid1 is the only level where one can use a component device directly
without assembling the array).  And for some reason you created such
an array, so I merely pointed out that it's a way for possible issues.
It's not in any way a "solution" to this bugreport.

But anyway, I still don't know what to do about this bug.  If it were
possible to reproduce it, but heck... ;)  I tried to re-create this
situation yesterday in qemu/kvm, but it all just works as expected.

One more wild guess.  Is it possible this has something to do with HPA
(host-protected area) in the drive(s), when the drive can be made larger
or smaller by issuing a special command to it?  That'd explain things,
because 1.0 metadata placement is tied to the end of the device, and if
device changes size this metadata is impossible to find.  But it is a
wild guess indeed, because in your case it only happened to one drive
not all of them, and this can't be just a bad luck that this is also
the first drive in the array and contains partition table.  I still
think it has something to do with the partition table and mdadm mis-
reporting something like "busy" this way.

Maybe by changing wording of this message to a more neutral, like,
"unable to FIND superblock" instead of "has no superblock", will fix
this :)

Thanks,

/mjt