r30534 - in /branches/upstream/libhtml-parser-perl/current: Changes META.yml Parser.pm Parser.xs TODO eg/htextsub lib/HTML/Entities.pm lib/HTML/HeadParser.pm lib/HTML/LinkExtor.pm t/headparser.t
antonio-guest at users.alioth.debian.org
antonio-guest at users.alioth.debian.org
Mon Feb 9 22:12:37 UTC 2009
Author: antonio-guest
Date: Mon Feb 9 22:12:32 2009
New Revision: 30534
URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=30534
Log:
[svn-upgrade] Integrating new upstream version, libhtml-parser-perl (3.60)
Modified:
branches/upstream/libhtml-parser-perl/current/Changes
branches/upstream/libhtml-parser-perl/current/META.yml
branches/upstream/libhtml-parser-perl/current/Parser.pm
branches/upstream/libhtml-parser-perl/current/Parser.xs
branches/upstream/libhtml-parser-perl/current/TODO
branches/upstream/libhtml-parser-perl/current/eg/htextsub
branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm
branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm
branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm
branches/upstream/libhtml-parser-perl/current/t/headparser.t
Modified: branches/upstream/libhtml-parser-perl/current/Changes
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/Changes?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/Changes (original)
+++ branches/upstream/libhtml-parser-perl/current/Changes Mon Feb 9 22:12:32 2009
@@ -1,3 +1,25 @@
+_______________________________________________________________________________
+2009-02-09 Release 3.60
+
+Ville Skytta (5):
+ Spelling fixes.
+ Test multi-value headers.
+ Documentation improvements.
+ Do not terminate head parsing on the <object> element (added in HTML 4.0).
+ Add support for HTML 5 <meta charset> and new HEAD elements.
+
+Damyan Ivanov (1):
+ Short description of the htextsub example
+
+Mike South (1):
+ Suppress warning when encode_entities is called with undef [RT#27567]
+
+Zefram (1):
+ HTML::Parser doesn't compile with perl 5.8.0.
+
+
+
+_______________________________________________________________________________
2008-11-24 Gisle Aas <gisle at ActiveState.com>
Release 3.59
@@ -8,6 +30,7 @@
+_______________________________________________________________________________
2008-11-17 Gisle Aas <gisle at ActiveState.com>
Release 3.58
@@ -21,6 +44,7 @@
+_______________________________________________________________________________
2008-11-16 Gisle Aas <gisle at ActiveState.com>
Release 3.57
@@ -37,11 +61,12 @@
+_______________________________________________________________________________
2007-01-12 Gisle Aas <gisle at ActiveState.com>
Release 3.56
- Cloning of parser state for compatiblity with threads.
+ Cloning of parser state for compatibility with threads.
Fixed by Bo Lindbergh <blgl at hagernas.com>.
Don't require whitespace between declaration tokens.
@@ -49,6 +74,7 @@
+_______________________________________________________________________________
2006-07-10 Gisle Aas <gisle at ActiveState.com>
Release 3.55
@@ -56,7 +82,7 @@
Treat <> at the end of document as text. Used to be
reported as a comment.
- Improved Firefox compatiblity for bad HTML:
+ Improved Firefox compatibility for bad HTML:
- Unclosed <script>, <style> are now treated as empty tags.
- Unclosed <textarea>, <xmp> and <plaintext> treat rest as text.
- Unclosed <title> closes at next tag.
@@ -65,6 +91,7 @@
+_______________________________________________________________________________
2006-04-28 Gisle Aas <gisle at ActiveState.com>
Release 3.54
@@ -78,6 +105,7 @@
+_______________________________________________________________________________
2006-04-27 Gisle Aas <gisle at ActiveState.com>
Release 3.53
@@ -90,6 +118,7 @@
+_______________________________________________________________________________
2006-04-26 Gisle Aas <gisle at ActiveState.com>
Release 3.52
@@ -103,6 +132,7 @@
+_______________________________________________________________________________
2006-03-22 Gisle Aas <gisle at ActiveState.com>
Release 3.51
@@ -120,6 +150,7 @@
+_______________________________________________________________________________
2006-02-14 Gisle Aas <gisle at ActiveState.com>
Release 3.50
@@ -129,6 +160,7 @@
+_______________________________________________________________________________
2006-02-08 Gisle Aas <gisle at ActiveState.com>
Release 3.49
@@ -141,6 +173,7 @@
+_______________________________________________________________________________
2005-12-02 Gisle Aas <gisle at ActiveState.com>
Release 3.48
@@ -257,7 +290,7 @@
garbage with older versions of perl.
Emit warning if entities are decoded and something in the first
- chunk looks like hibit UTF-8. Previously this warning was only
+ chunk looks like hi-bit UTF-8. Previously this warning was only
triggered for documents with BOM.
@@ -344,7 +377,7 @@
Release 3.37
Improved handling of HTML encoded surrogate pairs and illegally
- endoded Unicode; <http://rt.cpan.org/Ticket/Display.html?id=7785>.
+ encoded Unicode; <http://rt.cpan.org/Ticket/Display.html?id=7785>.
Patch by John Gardiner Myers <jgmyers at proofpoint.com>.
Avoid generating bad UTF8 strings when decoding entities
@@ -599,7 +632,7 @@
Release 3.21
- Fix a memory leak which occured when using filter methods.
+ Fix a memory leak which occurred when using filter methods.
Avoid a few compiler warnings (DEC C):
- Trailing comma found in enumerator list
@@ -1274,7 +1307,7 @@
Faster HTML::LinkExtor by taking advantage of the new
callback interface. The module now also uses URI.pm (instead
- of the old URI::URL) to do URI-absolutations.
+ of the old URI::URL) to absolutize URIs.
Faster HTML::TokeParser by taking advantage of new
accum interface.
@@ -1408,7 +1441,7 @@
instead of raising an exception, and strings like "*STDIN" are not
treated as globs any more.
- HTML::LinkExtor knowns about background attribute of <tables>.
+ HTML::LinkExtor knows about background attribute of <tables>.
Patch by Clinton Wong <clintdw at netcom.com>
HTML::TokeParser will parse large inline strings much faster now.
@@ -1491,7 +1524,7 @@
Release 2.16
- The HTML::Parser could some times break hex entites (like )
+ The HTML::Parser could some times break hex entities (like )
in the middle.
Removed remaining forced dependencies on libwww-perl modules. It
Modified: branches/upstream/libhtml-parser-perl/current/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/META.yml?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/META.yml (original)
+++ branches/upstream/libhtml-parser-perl/current/META.yml Mon Feb 9 22:12:32 2009
@@ -1,6 +1,6 @@
--- #YAML:1.0
name: HTML-Parser
-version: 3.59
+version: 3.60
abstract: HTML parser class
author:
- Gisle Aas <gisle at activestate.com>
@@ -21,7 +21,7 @@
directory:
- t
- inc
-generated_by: ExtUtils::MakeMaker version 6.48
+generated_by: ExtUtils::MakeMaker version 6.4801
meta-spec:
url: http://module-build.sourceforge.net/META-spec-v1.4.html
version: 1.4
Modified: branches/upstream/libhtml-parser-perl/current/Parser.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/Parser.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/Parser.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/Parser.pm Mon Feb 9 22:12:32 2009
@@ -9,7 +9,7 @@
use strict;
use vars qw($VERSION @ISA);
-$VERSION = "3.59";
+$VERSION = "3.60";
require HTML::Entities;
@@ -334,8 +334,8 @@
=item $p->backquote( $bool )
By default, only ' and " are recognized as quote characters around
-attribute values. MSIE also recognize backquotes for some reason.
-Enabling this attribute provide compatiblity with this behaviour.
+attribute values. MSIE also recognizes backquotes for some reason.
+Enabling this attribute provides compatibility with this behaviour.
=item $p->boolean_attribute_value( $val )
@@ -1200,7 +1200,7 @@
The parser can process raw undecoded UTF-8 sanely if the C<utf8_mode>
is enabled or if the "attr", "@attr" or "dtext" argspecs is avoided.
-=item Parsing string decoded with wrong endianess
+=item Parsing string decoded with wrong endianness
(W) The first character in the document is U+FFFE. This is not a
legal Unicode character but a byte swapped BOM. The result of parsing
Modified: branches/upstream/libhtml-parser-perl/current/Parser.xs
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/Parser.xs?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/Parser.xs (original)
+++ branches/upstream/libhtml-parser-perl/current/Parser.xs Mon Feb 9 22:12:32 2009
@@ -96,6 +96,10 @@
#define DOWARN (PL_dowarn & G_WARN_ON)
#else
#define DOWARN PL_dowarn
+#endif
+
+#ifndef CLONEf_JOIN_IN
+ #define CLONEf_JOIN_IN 0
#endif
/*
Modified: branches/upstream/libhtml-parser-perl/current/TODO
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/TODO?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/TODO (original)
+++ branches/upstream/libhtml-parser-perl/current/TODO Mon Feb 9 22:12:32 2009
@@ -2,7 +2,7 @@
- limit the length of markup elements that never end. Perhaps by
configurable limits on the length that markup can have and still
- be recongnized. Report stuff as 'text' when this happens?
+ be recognized. Report stuff as 'text' when this happens?
- remove 255 char limit on literal argspec strings
- implement backslash escapes in literal argspec string
- <![%app1;[...]]> (parameter entities)
Modified: branches/upstream/libhtml-parser-perl/current/eg/htextsub
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/eg/htextsub?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/eg/htextsub (original)
+++ branches/upstream/libhtml-parser-perl/current/eg/htextsub Mon Feb 9 22:12:32 2009
@@ -1,4 +1,8 @@
#!/usr/bin/perl -w
+
+# Shows how to mangle all plain text in an HTML document, using an arbitrary
+# Perl expression. Plain text is all text not within a tag declaration, i.e.
+# not in <p ...>, but possibly between <p> and </p>
use strict;
my $code = shift || usage();
Modified: branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm Mon Feb 9 22:12:32 2009
@@ -57,7 +57,7 @@
single character strings. If a key has ";" as suffix,
then occurrences in $string are only expanded if properly terminated
with ";". Entities without ";" will be expanded regardless of how
-they are terminated for compatiblity with how common browsers treat
+they are terminated for compatibility with how common browsers treat
entities in the Latin-1 range.
If $expand_prefix is TRUE then entities without trailing ";" in
@@ -139,7 +139,7 @@
@EXPORT = qw(encode_entities decode_entities _decode_entities);
@EXPORT_OK = qw(%entity2char %char2entity encode_entities_numeric);
-$VERSION = "3.57";
+$VERSION = "3.60";
sub Version { $VERSION; }
require HTML::Parser; # for fast XS implemented decode_entities
@@ -446,6 +446,7 @@
sub encode_entities
{
+ return undef unless defined $_[0];
my $ref;
if (defined wantarray) {
my $x = $_[0];
Modified: branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm Mon Feb 9 22:12:32 2009
@@ -13,6 +13,8 @@
$p->header('Title') # to access <title>....</title>
$p->header('Content-Base') # to access <base href="http://...">
$p->header('Foo') # to access <meta http-equiv="Foo" content="...">
+ $p->header('X-Meta-Author') # to access <meta name="author" content="...">
+ $p->header('X-Meta-Charset') # to access <meta charset="...">
=head1 DESCRIPTION
@@ -52,9 +54,18 @@
=item X-Meta-Foo:
-All E<lt>meta> elements will initialize headers with the prefix
-"C<X-Meta->" on the name. If the E<lt>meta> element contains a
-C<http-equiv> attribute, then it will be honored as the header name.
+All E<lt>meta> elements containing a C<name> attribute will result in
+headers using the prefix C<X-Meta-> appended with the value of the
+C<name> attribute as the name of the header, and the value of the
+C<content> attribute as the pushed header value.
+
+E<lt>meta> elements containing a C<http-equiv> attribute will result
+in headers as in above, but without the C<X-Meta-> prefix in the
+header name.
+
+E<lt>meta> elements containing a C<charset> attribute will result in
+an C<X-Meta-Charset> header, using the value of the C<charset>
+attribute as the pushed header value.
=back
@@ -76,7 +87,7 @@
use strict;
use vars qw($VERSION $DEBUG);
#$DEBUG = 1;
-$VERSION = "3.59";
+$VERSION = "3.60";
=item $hp = HTML::HeadParser->new
@@ -85,7 +96,7 @@
The object constructor. The optional $header argument should be a
reference to an object that implement the header() and push_header()
methods as defined by the C<HTTP::Headers> class. Normally it will be
-of some class that isa or delegates to the C<HTTP::Headers> class.
+of some class that is a or delegates to the C<HTTP::Headers> class.
If no $header is given C<HTML::HeadParser> will create an
C<HTTP::Header> object by itself (initially empty).
@@ -157,7 +168,14 @@
# SCRIPT* & META* & LINK*">
#
# <!ELEMENT HEAD O O (%head.content)>
-
+#
+# From HTML 4.01:
+#
+# <!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT">
+# <!ENTITY % head.content "TITLE & BASE?">
+# <!ELEMENT HEAD O O (%head.content;) +(%head.misc;)>
+#
+# Added in HTML 5: noscript, eventsource, command
sub start
{
@@ -167,8 +185,15 @@
if ($tag eq 'meta') {
my $key = $attr->{'http-equiv'};
if (!defined($key) || !length($key)) {
- return unless $attr->{'name'};
- $key = "X-Meta-\u$attr->{'name'}";
+ if ($attr->{name}) {
+ $key = "X-Meta-\u$attr->{name}";
+ } elsif ($attr->{charset}) { # HTML 5 <meta charset="...">
+ $key = "X-Meta-Charset";
+ $self->{header}->push_header($key => $attr->{charset});
+ return;
+ } else {
+ return;
+ }
}
$self->{'header'}->push_header($key => $attr->{content});
} elsif ($tag eq 'base') {
@@ -178,7 +203,8 @@
# This is a non-standard header. Perhaps we should just ignore
# this element
$self->{'header'}->push_header(Isindex => $attr->{prompt} || '?');
- } elsif ($tag =~ /^(?:title|script|style)$/) {
+ } elsif ($tag =~ /^(?:title|(?:no)?script|style|object
+ |eventsource|command)$/x) {
# Just remember tag. Initialize header when we see the end tag.
$self->{'tag'} = $tag;
} elsif ($tag eq 'link') {
Modified: branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm Mon Feb 9 22:12:32 2009
@@ -2,7 +2,7 @@
require HTML::Parser;
@ISA = qw(HTML::Parser);
-$VERSION = "3.57";
+$VERSION = "3.60";
=head1 NAME
@@ -104,7 +104,7 @@
=item $p->links
Returns a list of all links found in the document. The returned
-values will be anonymous arrays with the follwing elements:
+values will be anonymous arrays with the following elements:
[$tag, $attr => $url1, $attr2 => $url2,...]
@@ -155,7 +155,7 @@
}
# Make the parser. Unfortunately, we don't know the base yet
- # (it might be diffent from $url)
+ # (it might be different from $url)
$p = HTML::LinkExtor->new(\&callback);
# Request document and parse it as it arrives
Modified: branches/upstream/libhtml-parser-perl/current/t/headparser.t
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/t/headparser.t?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/t/headparser.t (original)
+++ branches/upstream/libhtml-parser-perl/current/t/headparser.t Mon Feb 9 22:12:32 2009
@@ -1,7 +1,7 @@
#!perl -w
use strict;
-use Test::More tests => 13;
+use Test::More tests => 15;
{ package H;
sub new { bless {}, shift; }
@@ -55,10 +55,14 @@
ignore this
</script>
+<noscript> ... and this </noscript>
+
+<object classid="foo">
<base href="http://www.sn.no">
<meta name="Keywords" content="test, test, test,...">
<meta name="Keywords" content="more">
+<meta charset="ISO-8859-1"><!-- HTML 5 -->
Dette er vanlig tekst. Denne teksten definerer også slutten på
<head> delen av dokumentet.
@@ -91,6 +95,8 @@
like($p->header('Title'), qr/Å være eller å ikke være/);
is($p->header('Expires'), 'Soon');
is($p->header('Content-Base'), 'http://www.sn.no');
+is_deeply($p->header('X-Meta-Keywords'), ['test, test, test,...', 'more']);
+is($p->header('X-Meta-Charset'), 'ISO-8859-1');
like($p->header('Link'), qr/<mailto:gisle\@aas.no>/);
# This header should not be present because the head ended
More information about the Pkg-perl-cvs-commits
mailing list