Bug#826907: mdadm: please configure either component device timeout or scterc to guard against scsi layer timeouts

Henrique de Moraes Holschuh hmh at debian.org
Fri Jun 10 09:20:33 UTC 2016


On Fri, 10 Jun 2016, Michael Tokarev wrote:
> 10.06.2016 04:31, Henrique de Moraes Holschuh wrote:
> > One must enable SCTERC (e.g. with smartctl -l scterr,70,500) before
> > starting the array (initramfs included).  Fortunately, suport for SCTERC
> > can be detected, and it can be queried, so one would only mess with it
> > when unset.  A longer write timeout might help ensure the HDD has time
> > to relocate the sector (there's never a good reason for an HDD with
> > spare sectors still available to return a write error other than a
> > SCTERC write timeout, or the spare tracks going bad/full).
> > 
> > Alternatively, mdadm could increase the timeout of sat/ata component
> > devices in the scsi layer, from 30s to something like 120s through
> > /sys/block/###/device/timeout.  This avoids worse data-loss in many
> > cases, but md will hang for far longer when the component device really
> > has gone to lalalala land...
> 
> I don't think this is a job of mdadm to tweak devices like this.
> Devices can be programmed independently, and with different timeouts
> for different functions.

Not if you want to avoid major data loss in the RAID side, and know
anything about it.

> Further, it is not a strog requiriment to do so in initramfs, this

Here we disagree... if one is going to fix this *anywhere*, one should
do it properly.

> One more thing.  Most consumer drives these days don't support SCTERC.

NAS drives are consumer drives, and almost all of them support SCTERC,
*but it defaults to disabled on power-up*.

> I've almost anecdotic situation here on our own machine.  It had 4
> 1Tb WD Black drives bought some years ago.  One drive failed, and we
> bought a replacement, of the same model.  It reports the SAME FIRMWARE
> VERSION as the older drives.  But it doesn't support SCTERC anymore,
> while the older drives does.  Go figure.

No need to: we all know about the (forced) market segmentation that was
done in the last years.  AFAIK, WD has SCTERC on the Red, Gold, Purple,
Re, We, Se, Ae lines, and doesn't have it (or removed it from) Black,
Blue or Green.

> On the other hand, many "enterprize" gear (machinees) don't support
> JBOD mode at all, enabling only hardware raid usage.  I see less and
> less machines supporting JBOD mode without additional licenses.

Which people pay for, to run ZFS.

> Having said that, I see less and less usefulness of supporting SCTERC
> in the first place.  For consumers it is less useful because modern
> drives don't support it, and for enterprize it is because the machines
> don't support JBOD.  Oh well.

Like I said, almost every modern NAS disk does.  And Debian is widely
used with these disks: they *are* consumer disks.

Is it possible to add hooks to mdadm that could be used to set up array
components through external scripts?  Or events?

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh



More information about the pkg-mdadm-devel mailing list