r29 - in mdadm/trunk/debian: . patches

Mon Jul 24 22:08:04 UTC 2006

Author: madduck
Date: 2006-07-24 22:08:03 +0000 (Mon, 24 Jul 2006)
New Revision: 29

Added:
   mdadm/trunk/debian/patches/99-raid5-vs-raid10-doc.dpatch
Modified:
   mdadm/trunk/debian/changelog
   mdadm/trunk/debian/mdadm-raid
   mdadm/trunk/debian/mdadm.docs
Log:
* Removed the code writing auto-detected devices to /var, which was silly
  since /var isn't necessarily mounted yet by the time mdadm-raid is called.
  Thanks to Mau for pointing this out.
* Added questionable RAID5-vs-RAID10.txt document.

Modified: mdadm/trunk/debian/changelog
===================================================================

--- mdadm/trunk/debian/changelog	2006-07-24 07:55:27 UTC (rev 28)
+++ mdadm/trunk/debian/changelog	2006-07-24 22:08:03 UTC (rev 29)
@@ -6,11 +6,15 @@
   * checkarray: check for presence of active RAID arrays and give an
     appropriate error if there are none present (closes: #379019).
   * checkarray: skip sync for RAID0 devices (closes: #379352).
+  * Removed the code writing auto-detected devices to /var, which was silly
+    since /var isn't necessarily mounted yet by the time mdadm-raid is called.
+    Thanks to Mau for pointing this out.
+  * Added questionable RAID5-vs-RAID10.txt document.
   * Updated debconf translations:
     - Japanese by Hideki Yamane, thanks!
     - French by Florentin Duneau, thanks! (closes: #379511)
 
- -- martin f. krafft <madduck at debian.org>  Mon, 24 Jul 2006 07:54:10 +0100
+ -- martin f. krafft <madduck at debian.org>  Mon, 24 Jul 2006 23:07:33 +0100
 
 mdadm (2.5.2-7) unstable; urgency=low
 

Modified: mdadm/trunk/debian/mdadm-raid
===================================================================
--- mdadm/trunk/debian/mdadm-raid	2006-07-24 07:55:27 UTC (rev 28)
+++ mdadm/trunk/debian/mdadm-raid	2006-07-24 22:08:03 UTC (rev 29)
@@ -10,8 +10,6 @@
 CONFIG=/etc/mdadm/mdadm.conf
 ALTCONFIG=/etc/mdadm.conf
 DEBIANCONFIG=/etc/default/mdadm
-RUNDIR=/var/run/mdadm
-AUTOSTARTED_DEVICES=$RUNDIR/autostarted-devices
 
 test -x $MDADM || exit 0
 
@@ -78,8 +76,6 @@
       fi
 
       if [ -f $CONFIG ] || [ -f $ALTCONFIG ]; then
-        mkdir -p $RUNDIR
-
         # ugly hack because shell sucks
         IFSOLD=${IFS:-}
         IFS='
@@ -115,12 +111,10 @@
 
             *'has been started with '[[:digit:]]*' drives.')
               log_dev 0 $1 "started [$6/$6]"
-              echo $1 >> $AUTOSTARTED_DEVICES
               ;;
 
             *'has been started with '[[:digit:]]*' drives (out of '[[:digit:]]*').')
               log_dev 0 $1 "degraded [$6/${10%).}]"
-              echo $1 >> $AUTOSTARTED_DEVICES
               ;;
 
             *'assembled from '[[:digit:]]*' drive's#' - not enough to start the array.')

Modified: mdadm/trunk/debian/mdadm.docs
===================================================================
--- mdadm/trunk/debian/mdadm.docs	2006-07-24 07:55:27 UTC (rev 28)
+++ mdadm/trunk/debian/mdadm.docs	2006-07-24 22:08:03 UTC (rev 29)
@@ -2,3 +2,4 @@
 debian/README.mdrun
 md.txt
 rootraiddoc.97.html
+RAID5_versus_RAID10.txt

Added: mdadm/trunk/debian/patches/99-raid5-vs-raid10-doc.dpatch
===================================================================
--- mdadm/trunk/debian/patches/99-raid5-vs-raid10-doc.dpatch	2006-07-24 07:55:27 UTC (rev 28)
+++ mdadm/trunk/debian/patches/99-raid5-vs-raid10-doc.dpatch	2006-07-24 22:08:03 UTC (rev 29)
@@ -0,0 +1,182 @@
+#! /bin/sh /usr/share/dpatch/dpatch-run
+## 99-raid5-vs-raid10-doc.dpatch by martin f. krafft <madduck at debian.org>
+##
+## All lines beginning with `## DP:' are a description of the patch.
+## DP: No description.
+
+ at DPATCH@
+diff -urNad trunk~/RAID5_versus_RAID10.txt trunk/RAID5_versus_RAID10.txt
+--- trunk~/RAID5_versus_RAID10.txt	1970-01-01 01:00:00.000000000 +0100
++++ trunk/RAID5_versus_RAID10.txt	2006-07-24 23:06:52.373326792 +0100
+@@ -0,0 +1,171 @@
++# from http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt
++#
++# note: I, the Debian maintainer, do not agree with much of what's written
++# here, but it's a good argument the author is putting forth. In the end, the
++# RAID level you choose depends on your needs only.
++#
++
++RAID5 versus RAID10 (or even RAID3 or RAID4)
++
++First let's get on the same page so we're all talking about apples.
++
++What is RAID5?
++
++OK here is the deal, RAID5 uses ONLY ONE parity drive per stripe and many
++RAID5 arrays are 5 (if your counts are different adjust the calculations
++appropriately) drives (4 data and 1 parity though it is not a single drive
++that is holding all of the parity as in RAID 3 & 4 but read on). If you
++have 10 drives or say 20GB each for 200GB RAID5 will use 20% for parity
++(assuming you set it up as two 5 drive arrays) so you will have 160GB of
++storage.  Now since RAID10, like mirroring (RAID1), uses 1 (or more) mirror
++drive for each primary drive you are using 50% for redundancy so to get the
++same 160GB of storage you will need 8 pairs or 16 - 20GB drives, which is
++why RAID5 is so popular.  This intro is just to put things into
++perspective.
++
++RAID5 is physically a stripe set like RAID0 but with data recovery
++included.  RAID5 reserves one disk block out of each stripe block for
++parity data.  The parity block contains an error correction code which can
++correct any error in the RAID5 block, in effect it is used in combination
++with the remaining data blocks to recreate any single missing block, gone
++missing because a drive has failed.  The innovation of RAID5 over RAID3 &
++RAID4 is that the parity is distributed on a round robin basis so that
++there can be independent reading of different blocks from the several
++drives.  This is why RAID5 became more popular than RAID3 & RAID4 which
++must sychronously read the same block from all drives together.  So, if
++Drive2 fails blocks 1,2,4,5,6 & 7 are data blocks on this drive and blocks
++3 and 8 are parity blocks on this drive.  So that means that the parity on
++Drive5 will be used to recreate the data block from Disk2 if block 1 is
++requested before a new drive replaces Drive2 or during the rebuilding of
++the new Drive2 replacement.  Likewise the parity on Drive1 will be used to
++repair block 2 and the parity on Drive3 will repair block4, etc.  For block
++2 all the data is safely on the remaining drives but during the rebuilding
++of Drive2's replacement a new parity block will be calculated from the
++block 2 data and will be written to Drive 2.
++
++Now when a disk block is read from the array the RAID software/firmware
++calculates which RAID block contains the disk block, which drive the disk
++block is on and which drive contains the parity block for that RAID block
++and reads ONLY the one data drive.  It returns the data block.  If you
++later modify the data block it recalculates the parity by subtracting the
++old block and adding in the new version then in two separate operations it
++writes the data block followed by the new parity block.  To do this it must
++first read the parity block from whichever drive contains the parity for
++that stripe block and reread the unmodified data for the updated block from
++the original drive. This read-read-write-write is known as the RAID5 write
++penalty since these two writes are sequential and synchronous the write
++system call cannot return until the reread and both writes complete, for
++safety, so writing to RAID5 is up to 50% slower than RAID0 for an array of
++the same capacity.  (Some software RAID5's avoid the re-read by keeping an
++unmodified copy of the orginal block in memory.)
++
++Now what is RAID10:
++
++RAID10 is one of the combinations of RAID1 (mirroring) and RAID0
++(striping) which are possible.  There used to be confusion about what
++RAID01 or RAID01 meant and different RAID vendors defined them
++differently.  About five years or so ago I proposed the following standard
++language which seems to have taken hold.  When N mirrored pairs are
++striped together this is called RAID10 because the mirroring (RAID1) is
++applied before striping (RAID0).  The other option is to create two stripe
++sets and mirror them one to the other, this is known as RAID01 (because
++the RAID0 is applied first).  In either a RAID01 or RAID10 system each and
++every disk block is completely duplicated on its drive's mirror.
++Performance-wise both RAID01 and RAID10 are functionally equivalent.  The
++difference comes in during recovery where RAID01 suffers from some of the
++same problems I will describe affecting RAID5 while RAID10 does not.
++
++Now if a drive in the RAID5 array dies, is removed, or is shut off data is
++returned by reading the blocks from the remaining drives and calculating
++the missing data using the parity, assuming the defunct drive is not the
++parity block drive for that RAID block.  Note that it takes 4 physical
++reads to replace the missing disk block (for a 5 drive array) for four out
++of every five disk blocks leading to a 64% performance degradation until
++the problem is discovered and a new drive can be mapped in to begin
++recovery.  Performance is degraded further during recovery because all
++drives are being actively accessed in order to rebuild the replacement
++drive (see below). 
++
++If a drive in the RAID10 array dies data is returned from its mirror drive
++in a single read with only minor (6.25% on average for a 4 pair array as a
++whole) performance reduction when two non-contiguous blocks are needed from
++the damaged pair (since the two blocks cannot be read in parallel from both
++drives) and none otherwise.
++
++One begins to get an inkling of what is going on and why I dislike RAID5,
++but, as they say on late night info-mercials, there's more.
++
++What's wrong besides a bit of performance I don't know I'm missing?
++
++OK, so that brings us to the final question of the day which is: What is
++the problem with RAID5?  It does recover a failed drive right?  So writes
++are slower, I don't do enough writing to worry about it and the cache
++helps a lot also, I've got LOTS of cache!  The problem is that despite the
++improved reliability of modern drives and the improved error correction
++codes on most drives, and even despite the additional 8 bytes of error
++correction that EMC puts on every Clariion drive disk block (if you are
++lucky enough to use EMC systems), it is more than a little possible that a
++drive will become flaky and begin to return garbage.  This is known as
++partial media failure.  Now SCSI controllers reserve several hundred disk
++blocks to be remapped to replace fading sectors with unused ones, but if
++the drive is going these will not last very long and will run out and SCSI
++does NOT report correctable errors back to the OS!  Therefore you will not
++know the drive is becoming unstable until it is too late and there are no
++more replacement sectors and the drive begins to return garbage.  [Note
++that the recently popular IDE/ATA drives do not (TMK) include bad sector
++remapping in their hardware so garbage is returned that much sooner.]
++When a drive returns garbage, since RAID5 does not EVER check parity on
++read (RAID3 & RAID4 do BTW and both perform better for databases than
++RAID5 to boot) when you write the garbage sector back garbage parity will
++be calculated and your RAID5 integrity is lost!  Similarly if a drive
++fails and one of the remaining drives is flaky the replacement will be
++rebuilt with garbage also propagating the problem to two blocks instead of
++just one.
++
++Need more?  During recovery, read performance for a RAID5 array is
++degraded by as much as 80%.  Some advanced arrays let you configure the
++preference more toward recovery or toward performance.  However, doing so
++will increase recovery time and increase the likelihood of losing a second
++drive in the array before recovery completes resulting in catastrophic
++data loss.  RAID10 on the other hand will only be recovering one drive out
++of 4 or more pairs with performance ONLY of reads from the recovering pair
++degraded making the performance hit to the array overall only about 20%!
++Plus there is no parity calculation time used during recovery - it's a
++straight data copy.
++
++What about that thing about losing a second drive?  Well with RAID10 there
++is no danger unless the one mirror that is recovering also fails and
++that's 80% or more less likely than that any other drive in a RAID5 array
++will fail!  And since most multiple drive failures are caused by
++undetected manufacturing defects you can make even this possibility
++vanishingly small by making sure to mirror every drive with one from a
++different manufacturer's lot number.  ("Oh", you say, "this schenario does
++not seem likely!"  Pooh, we lost 50 drives over two weeks when a batch of
++200 IBM drives began to fail.  IBM discovered that the single lot of
++drives would have their spindle bearings freeze after so many hours of
++operation.  Fortunately due in part to RAID10 and in part to a herculean
++effort by DG techs and our own people over 2 weeks no data was lost.
++HOWEVER, one RAID5 filesystem was a total loss after a second drive failed
++during recover.  Fortunately everything was on tape.
++
++Conclusion?  For safety and performance favor RAID10 first, RAID3 second,
++RAID4 third, and RAID5 last!  The original reason for the RAID2-5 specs
++was that the high cost of disks was making RAID1, mirroring, impractical.
++That is no longer the case!  Drives are commodity priced, even the biggest
++fastest drives are cheaper in absolute dollars than drives were then and
++cost per MB is a tiny fraction of what it was.  Does RAID5 make ANY sense
++anymore?  Obviously I think not.
++
++To put things into perspective: If a drive costs $1000US (and most are far
++less expensive than that) then switching from a 4 pair RAID10 array to a 5
++drive RAID5 array will save 3 drives or $3000US.  What is the cost of
++overtime, wear and tear on the technicians, DBAs, managers, and customers
++of even a recovery scare?  What is the cost of reduced performance and
++possibly reduced customer satisfaction?  Finally what is the cost of lost
++business if data is unrecoverable?  I maintain that the drives are FAR
++cheaper!  Hence my mantra:
++
++NO RAID5!  NO RAID5!  NO RAID5!  NO RAID5!  NO RAID5!  NO RAID5!  NO RAID5!
++
++Art S. Kagel
++


Property changes on: mdadm/trunk/debian/patches/99-raid5-vs-raid10-doc.dpatch
___________________________________________________________________
Name: svn:executable
   + *