[xml/sgml-pkgs] Bug#574104: Bug#574104: libxml2: considers null bytes as EOF markers

Mike Hommey mh at glandium.org
Tue Mar 16 12:04:29 UTC 2010


On Tue, Mar 16, 2010 at 12:37:05PM +0100, Jakub Wilk wrote:
> * Mike Hommey <mh at glandium.org>, 2010-03-16, 12:23:
> >>libxml2 ignores null bytes (and following bytes) in an XML file:
> >>
> >>$ printf '<test/>\0junk' | xmlwf
> >>STDIN:1:7: not well-formed (invalid token)
> >>
> >>$ printf '<test/>\0junk' | xmllint -
> >><?xml version="1.0"?>
> >><test/>
> >
> >For a starter, libxml2 treats your data as UTF-8, and as such uses null
> >terminated strings, so this is not an unexpected behaviour.
> 
> Huh? Why should I care about such implementation details? I care
> about behaviour, which is broken. (Anyway, UTF-8 and null-terminated
> string are *unrelated* concepts.)
> 
> >Secondly, the null character is not allowed in a xml file.
> 
> That's my point. It is not allowed, yet xmllint happily accept files
> containing it as well-formed.

Oh, sorry for the misunderstanding.

Interestingly, it *does* recognize some brokenness due to null
characters:

$ printf '<test>ju\0nk</test>' |xmllint -
-:1: parser error : Char 0x0 out of allowed range
<test>ju
        ^
-:1: parser error : Premature end of data in tag test line 1
<test>ju
        ^

Mike





More information about the debian-xml-sgml-pkgs mailing list