Bug#684708: mdadm: support external metadata arrays correctly

Michael Tokarev mjt at tls.msk.ru
Sun Oct 21 09:46:48 UTC 2012


I'm sorry this took too long.  I was very busy last ~3 weeks.
Now I looked at it all, and have a few comments.  I'm not sure
there's a need to respin/resend this patch, if you agree I'll
take care of it myself.  Comments are inline.

On 02.10.2012 17:20, Miquel van Smoorenburg wrote:
> Package: mdadm
> Version: 3.2.5-1
> 
> [version 2 of the patch]
> 
> The initramfs hook supplied by mdadm doesn't install mdmon. Also, mdmon
> is not included in the .udeb for the installer.
> 
> This means that if you have an array with external metadata (ddf or,
> more widely used, imsm - Intel Matrix Raid) that it will come up
> readonly. This causes the installer to hang or the system not being
> able to boot if root is on that array.
> 
> The attached patch does the following:
> 
> - it makes sure mdadm is included in the initramfs and the udeb package
> - it adds a mdadm-waitidle script that runs just before reboot/halt. For
>   arrays that are still running, it calls mdadm --wait-clean to wait
>   for the array to go idle.  This is needed so that the array is marked
>   clean, otherwise it will start to resync at the next boot.

This is one of the places which I dislike, and there's an easy fix for
my dislike, see comment in the patch itself.

> - it adds a few lines to /etc/init.d/mdadm for the start and stop actions:
>   o start: if a mdmon pidfile is found in /run/mdadm, restart mdmon

This is interesting.  mdmon has --takeover and --offroot flags.  But
apparently there's a bug in mdmon which prevents it from working in
this situation -- when initially mdmon has been started with --offroot
(which replaces command name in ps(1) from mdmon to @dmon -- which, BTW,
should be sufficient for action similar to putting it to sendsigs.omit.d),
subsequent mdmon --takeover does not work since it looks for mdmon not
@dmon.  It is fixed in to-be-released upstream bugfix release of mdadm package.

At least we shoudl be careful here.

>   o stop: link pidfiles of mdmon processes into /run/sendsigs.omit.d,
>     and make sure that happens before sendsigs runs.

Ok.

> - RUNDIR in /etc/init.d/mdadm is set to /run. This is sure to be ok:
>   mdadm itself is already compiled with rundir hardcoded to /run.

I think it is a bug actually, which should be addressed separately.
I'm not sure it is okay to use /var/run in there in this package,
since /var might be mounted AFTER md arrays are assembled.  But this
appears to be just mdadm monitor, not raid array assembly, so should
work either way I think...

> Note that I've made sure that this actually doesn't do /anything/
> if you do not have running MD arrays with external metadata. Iow,
> this should not break anything, or cause regressions.
> 
> I have added support for installation on Intel Matrix raid (imsm)
> arrays using mdadm to d-i, and I'll be sending patches to the debian-boot
> list soon. Please consider this patch for inclusion in wheezy.
> 
[]
> diff -ruN x/mdadm-3.2.5/debian/mdadm-waitidle mdadm-3.2.5/debian/mdadm-waitidle
> --- x/mdadm-3.2.5/debian/mdadm-waitidle	1970-01-01 00:00:00.000000000 +0000
> +++ mdadm-3.2.5/debian/mdadm-waitidle	2012-09-22 19:14:34.236895943 +0000
[]
> +  stop)
> +    # See which arrays have external metadata
> +    arrays=
> +    cd /sys/block
> +    for md in md*
> +    do
> +      metadata="`cat $md/md/metadata_version 2>/dev/null ||:`"
> +      case "$metadata" in
> +        external:[-/]*)
> +          arrays="$arrays $md"
> +          ;;
> +      esac
> +    done
> +
> +    # Only take action for arrays with external metadata
> +    [ -n "$arrays" ] || exit 0
> +    log_action_begin_msg "Waiting for MD arrays to become idle"
> +    sync
> +    err=
> +    for md in $arrays; do
> +      # Stop any sync action
> +      if [ -w $md/md/sync_action ]; then
> +        echo idle > $md/md/sync_action ||:
> +      fi
> +      # Wait for the array to become idle (max 5 secs)
> +      if ! $MDADM --wait-clean /dev/$md >/dev/null 2>&1; then
> +        err=1
> +      fi
> +    done
> +    if [ -n "$err" ]
> +    then
> +      log_action_end_msg 1
> +      sleep 1
> +    else
> +      log_action_end_msg 0
> +    fi

This whole stop logic can be done a bit, or entirely, differently.

First, it is not clear why you say "max 5 secs".  Is it logic
built-in to mdadm itself, since there's no --max-wait or something
argument?

Next, mdadm can wait for MULTIPLE arrays at once.  This way:

    log_action_begin_msg "Waiting for MD arrays to become idle"
    sync
    err=
    wait=
    for md in $arrays; do
      # Stop any sync action
      if [ -w $md/md/sync_action ]; then

#Is this -w needed?  Can it be unwritable?

        echo idle > $md/md/sync_action ||:

#Is this "echo idle" necessary, ie, won't mdadm --wait-clean take care
#of it by itself?  Also, is echoing 'idle' to array which is already idle
#a noop?

      fi
      wait="$wait /dev/$md"
    done
    # Wait for the array to become idle (max 5 secs)
    if ! $MDADM --wait-clean $wait >/dev/null 2>&1; then
    then
      log_action_end_msg 1
      sleep 1
    else
      log_action_end_msg 0
    fi


Also, we can combine two loops - first which searches for
arrays with external metadata and second which idles them -
into one.

And finally, mdadm have one more option here: it can wait
for ALL arrays, like this:

  mdadm --wait-clean --scan

And, according to the manpage, waiting is a no-op for arrays
without external metadata.  I'm not sure how true it is in
practice (I mean, if there are bugs in this context somewhere),
but it looks like it is better to actually ENABLE such mode
unconditionally for ALL arrays, and adding this "idling" is
a good idea TOO: suppose we have native md arrays which are
resyncing/rebuilding at the time of reboot, it might be a
good idea to stop this process gracefully and to wait till
it finishes, before powering the machine off!

So, finally, this whole "stop" case - I'd write it like this:

 stop)
   sync   # XXX it can be a bad idea to sync here?
   # check if there are ANY arrays, and stop any ongoing sync_actions
   wait=
   for sf in /sys/block/md* ; do
     [ "$wait" ] || log_action_begin_msg "Waiting for MD arrays to become idle"
     wait=y
     [ -w $sf/md/sync_action ] && echo idle > $sf/md/sync_action
   done
   if [ ! "$wait" ]; then :
   # Wait for the array to become idle (max 5 secs)
   elif ! $MDADM --wait-clean --scan; then
     log_action_end_msg 1
     sleep 1
   else
     log_action_end_msg 0
   fi
   ;;

If it's okay for you (modulo possible bugs/typos -- I didn't
try to run this code!), I'll use this version.

[]
> --- x/mdadm-3.2.5/debian/mdadm.init	2012-05-10 20:22:16.000000000 +0000
> +++ mdadm-3.2.5/debian/mdadm.init	2012-09-22 19:18:53.401573785 +0000
> @@ -9,7 +9,7 @@
>  ### BEGIN INIT INFO
>  # Provides:          mdadm
>  # Required-Start:    $local_fs $syslog mdadm-raid
> -# Required-Stop:     $local_fs $syslog mdadm-raid
> +# Required-Stop:     $local_fs $syslog sendsigs mdadm-raid

Is this really needed?  It feels to me like inverse logic, ie,
sendsigs should be stopped BEFORE mdadm-raid, no?  Probably I'm
wrong.. ;)

> @@ -54,6 +55,12 @@
>        log_end_msg $?
>        set -e
>      fi
> +    if [ "`echo $RUNDIR/md[0-9]*.pid`" != "$RUNDIR/md[0-9]*.pid" ]
> +    then
> +      log_daemon_msg "Restarting MD external metadata monitor" "mdmon -t --all"
> +      $MDMON -t --all
> +      log_end_msg $?
> +    fi

The bug in mdmon I mentioned earlier..  Oh well.


Ok.

Overall: do we need wait-idle from within udeb/d-i too?

/mjt



More information about the pkg-mdadm-devel mailing list