[xml/sgml-pkgs] Bug#574104: Bug#574104: libxml2: considers null bytes as EOF markers

Jakub Wilk jwilk at debian.org
Tue Mar 16 12:25:41 UTC 2010


* Mike Hommey <mh at glandium.org>, 2010-03-16, 13:04:
>> >>libxml2 ignores null bytes (and following bytes) in an XML file:
>> >>
>> >>$ printf '<test/>\0junk' | xmlwf
>> >>STDIN:1:7: not well-formed (invalid token)
>> >>
>> >>$ printf '<test/>\0junk' | xmllint -
>> >><?xml version="1.0"?>
>> >><test/>
>> >
>> >For a starter, libxml2 treats your data as UTF-8, and as such uses null
>> >terminated strings, so this is not an unexpected behaviour.
>>
>> Huh? Why should I care about such implementation details? I care
>> about behaviour, which is broken. (Anyway, UTF-8 and null-terminated
>> string are *unrelated* concepts.)
>>
>> >Secondly, the null character is not allowed in a xml file.
>>
>> That's my point. It is not allowed, yet xmllint happily accept files
>> containing it as well-formed.
>
>Oh, sorry for the misunderstanding.

No problem. :)

>Interestingly, it *does* recognize some brokenness due to null
>characters:
>
>$ printf '<test>ju\0nk</test>' |xmllint -
>-:1: parser error : Char 0x0 out of allowed range
><test>ju
>        ^
>-:1: parser error : Premature end of data in tag test line 1
><test>ju
>        ^

Also, the stream parser deals with nulls bytes correctly:

$ printf '<test/>\0junk' | xmllint --stream -
-:1: parser error : Extra content at the end of the document
<test/>
        ^
- : failed to parse

-- 
Jakub Wilk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/debian-xml-sgml-pkgs/attachments/20100316/1a44a7f7/attachment.pgp>


More information about the debian-xml-sgml-pkgs mailing list