[Pkg-haskell-maintainers] Bug#748125: System.Posix.Directory.readDirStream can return strings that S.P.Files.getFileStatus cannot use

Joachim Breitner nomeata at debian.org
Thu May 15 09:12:55 UTC 2014


Control: forwarded 748125 https://ghc.haskell.org/trac/ghc/ticket/9114

Dear Robert,

Am Donnerstag, den 15.05.2014, 10:34 +0200 schrieb Robert Bihlmeyer:
> Interestingly, most of the invalid UTF-8 I tried survived the roundtrip
> through String. What doesn't work in these cases is outputting this
> String -- but I wouldn't expect it to. But getFileStatus accepts the
> String and stats the right file (can be proven with "strace -fe stat"
> for example).
> 
> Up to now I found exactly one class of byte sequences that do not work:
> illegal (sub-optimal) encodings of ASCII characters. The attached tar
> contains a filename with the two bytes C0 and B7 followed by
> '.txt'. C0B7 is an invalid encoding of 37 i.e. '7'. 
> 
> It looks like GHC accepts the invalid encoding and stores the result as
> the normal character '7'. The error points in this direction:
> 
>   dirtest.hs: 7.txt: getFileStatus: does not exist (No such file or directory)
> 
> Contrary to that, a sub-optimal encoding of 'ö' (U+00F6) as E0 83 B6
> works fine, as do the numerous other illegal combinations of
> high-bit-set characters I tried.
> 
> So my assumption is that there is special casing if the result of UTF-8
> decoding is an ASCII character.

Indeed it seems that GHC is prepared to round-trip invalid sequences
through Unicode – see the documentation at
https://hackage.haskell.org/package/base-4.7.0.0/docs/GHC-IO-Encoding.html#v:mkTextEncoding
and the source, which indicates that the roundtrip mode is used for
filenames.

So the bug here is that the roundtripping mechanism does not work as
intended. Filed as https://ghc.haskell.org/trac/ghc/ticket/9114.

thanks for spotting!,
Joachim




-- 
Joachim "nomeata" Breitner
Debian Developer
  nomeata at debian.org | ICQ# 74513189 | GPG-Keyid: F0FBF51F
  JID: nomeata at joachim-breitner.de | http://people.debian.org/~nomeata

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.alioth.debian.org/pipermail/pkg-haskell-maintainers/attachments/20140515/4f20fc0c/attachment.sig>


More information about the Pkg-haskell-maintainers mailing list