Bug#780207: default timeouts causing data loss

Chris email.bug at arcor.de
Wed Mar 18 09:51:28 UTC 2015


Am Sun, 15 Mar 2015 23:57:44 +0100
schrieb Tobias Frost <tobi at debian.org>:

Thanks for responding.

> Control: Severity 780207 important
> Control: Severity 780162 wishlist
> 
> Hi Chris,
> 
> can you please let us know the link to the upstream discussion?

There are frequent reports and responds (like look at recent
threadsfor "timeout mismatch") comming up.

Here is the thread where I got the hint, and then gathered some more
info from looking up more precise responds:
http://thread.gmane.org/gmane.linux.raid/48071/focus=48086
(The README in the .zip contains the last version.)


> From your description, I don't see a imminent risk of data loss which
> warrants a RC bug level. Therefore downgrading to important. 

AFAIK the all drives that have no data recovery timeout and
try to recover a read error longer than the 30s controller timeout (most
regular non-raid drives) get completely reset upon a simple block error,
risking not only the block but any open/unwritten data.

A raid may be able to recover everything from a redundand disk, only if
there is no second read error while rebuilding the entire disk. The
risk of a error while reading large disks is high (significant
amount of such rebuild failure reports) where the second controller
reset leads to the raid failing to rebuild leaving behind the array in
a corrupt state.
Without raid, there is no chance to recover, not only the defect block
but also not any other open/unwritten data.

That was why I set the bug level to RC, and it does somehow still look
quite RC to me.



More information about the pkg-mdadm-devel mailing list