r30534 - in /branches/upstream/libhtml-parser-perl/current: Changes META.yml Parser.pm Parser.xs TODO eg/htextsub lib/HTML/Entities.pm lib/HTML/HeadParser.pm lib/HTML/LinkExtor.pm t/headparser.t

antonio-guest at users.alioth.debian.org antonio-guest at users.alioth.debian.org
Mon Feb 9 22:12:37 UTC 2009


Author: antonio-guest
Date: Mon Feb  9 22:12:32 2009
New Revision: 30534

URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=30534
Log:
[svn-upgrade] Integrating new upstream version, libhtml-parser-perl (3.60)

Modified:
    branches/upstream/libhtml-parser-perl/current/Changes
    branches/upstream/libhtml-parser-perl/current/META.yml
    branches/upstream/libhtml-parser-perl/current/Parser.pm
    branches/upstream/libhtml-parser-perl/current/Parser.xs
    branches/upstream/libhtml-parser-perl/current/TODO
    branches/upstream/libhtml-parser-perl/current/eg/htextsub
    branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm
    branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm
    branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm
    branches/upstream/libhtml-parser-perl/current/t/headparser.t

Modified: branches/upstream/libhtml-parser-perl/current/Changes
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/Changes?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/Changes (original)
+++ branches/upstream/libhtml-parser-perl/current/Changes Mon Feb  9 22:12:32 2009
@@ -1,3 +1,25 @@
+_______________________________________________________________________________
+2009-02-09  Release 3.60
+
+Ville Skytta (5):
+      Spelling fixes.
+      Test multi-value headers.
+      Documentation improvements.
+      Do not terminate head parsing on the <object> element (added in HTML 4.0).
+      Add support for HTML 5 <meta charset> and new HEAD elements.
+
+Damyan Ivanov (1):
+      Short description of the htextsub example
+
+Mike South (1):
+      Suppress warning when encode_entities is called with undef [RT#27567]
+
+Zefram (1):
+      HTML::Parser doesn't compile with perl 5.8.0.
+
+
+
+_______________________________________________________________________________
 2008-11-24   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.59
@@ -8,6 +30,7 @@
 
 
 
+_______________________________________________________________________________
 2008-11-17   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.58
@@ -21,6 +44,7 @@
 
 
 
+_______________________________________________________________________________
 2008-11-16   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.57
@@ -37,11 +61,12 @@
 
 
 
+_______________________________________________________________________________
 2007-01-12   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.56
 
-     Cloning of parser state for compatiblity with threads.
+     Cloning of parser state for compatibility with threads.
      Fixed by Bo Lindbergh <blgl at hagernas.com>.
 
      Don't require whitespace between declaration tokens.
@@ -49,6 +74,7 @@
 
 
 
+_______________________________________________________________________________
 2006-07-10   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.55
@@ -56,7 +82,7 @@
      Treat <> at the end of document as text.  Used to be
      reported as a comment.
 
-     Improved Firefox compatiblity for bad HTML:
+     Improved Firefox compatibility for bad HTML:
       - Unclosed <script>, <style> are now treated as empty tags.
       - Unclosed <textarea>, <xmp> and <plaintext> treat rest as text.
       - Unclosed <title> closes at next tag.
@@ -65,6 +91,7 @@
 
 
 
+_______________________________________________________________________________
 2006-04-28   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.54
@@ -78,6 +105,7 @@
 
 
 
+_______________________________________________________________________________
 2006-04-27   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.53
@@ -90,6 +118,7 @@
 
 
 
+_______________________________________________________________________________
 2006-04-26   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.52
@@ -103,6 +132,7 @@
 
 
 
+_______________________________________________________________________________
 2006-03-22   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.51
@@ -120,6 +150,7 @@
 
 
 
+_______________________________________________________________________________
 2006-02-14   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.50
@@ -129,6 +160,7 @@
 
 
 
+_______________________________________________________________________________
 2006-02-08   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.49
@@ -141,6 +173,7 @@
 
 
 
+_______________________________________________________________________________
 2005-12-02   Gisle Aas <gisle at ActiveState.com>
 
      Release 3.48
@@ -257,7 +290,7 @@
      garbage with older versions of perl.
 
      Emit warning if entities are decoded and something in the first
-     chunk looks like hibit UTF-8.  Previously this warning was only
+     chunk looks like hi-bit UTF-8.  Previously this warning was only
      triggered for documents with BOM.
 
 
@@ -344,7 +377,7 @@
      Release 3.37
 
      Improved handling of HTML encoded surrogate pairs and illegally
-     endoded Unicode; <http://rt.cpan.org/Ticket/Display.html?id=7785>.
+     encoded Unicode; <http://rt.cpan.org/Ticket/Display.html?id=7785>.
      Patch by John Gardiner Myers <jgmyers at proofpoint.com>.
 
      Avoid generating bad UTF8 strings when decoding entities
@@ -599,7 +632,7 @@
 
      Release 3.21
 
-     Fix a memory leak which occured when using filter methods.
+     Fix a memory leak which occurred when using filter methods.
 
      Avoid a few compiler warnings (DEC C):
         - Trailing comma found in enumerator list
@@ -1274,7 +1307,7 @@
 
    Faster HTML::LinkExtor by taking advantage of the new
    callback interface.  The module now also uses URI.pm (instead
-   of the old URI::URL) to do URI-absolutations.
+   of the old URI::URL) to absolutize URIs.
 
    Faster HTML::TokeParser by taking advantage of new
    accum interface.
@@ -1408,7 +1441,7 @@
    instead of raising an exception, and strings like "*STDIN" are not
    treated as globs any more.
 
-   HTML::LinkExtor knowns about background attribute of <tables>.
+   HTML::LinkExtor knows about background attribute of <tables>.
    Patch by Clinton Wong <clintdw at netcom.com>
 
    HTML::TokeParser will parse large inline strings much faster now.
@@ -1491,7 +1524,7 @@
 
    Release 2.16
    
-   The HTML::Parser could some times break hex entites (like &#xFFFF;)
+   The HTML::Parser could some times break hex entities (like &#xFFFF;)
    in the middle.
 
    Removed remaining forced dependencies on libwww-perl modules.  It

Modified: branches/upstream/libhtml-parser-perl/current/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/META.yml?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/META.yml (original)
+++ branches/upstream/libhtml-parser-perl/current/META.yml Mon Feb  9 22:12:32 2009
@@ -1,6 +1,6 @@
 --- #YAML:1.0
 name:               HTML-Parser
-version:            3.59
+version:            3.60
 abstract:           HTML parser class
 author:
     - Gisle Aas <gisle at activestate.com>
@@ -21,7 +21,7 @@
     directory:
         - t
         - inc
-generated_by:       ExtUtils::MakeMaker version 6.48
+generated_by:       ExtUtils::MakeMaker version 6.4801
 meta-spec:
     url:      http://module-build.sourceforge.net/META-spec-v1.4.html
     version:  1.4

Modified: branches/upstream/libhtml-parser-perl/current/Parser.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/Parser.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/Parser.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/Parser.pm Mon Feb  9 22:12:32 2009
@@ -9,7 +9,7 @@
 use strict;
 use vars qw($VERSION @ISA);
 
-$VERSION = "3.59";
+$VERSION = "3.60";
 
 require HTML::Entities;
 
@@ -334,8 +334,8 @@
 =item $p->backquote( $bool )
 
 By default, only ' and " are recognized as quote characters around
-attribute values.  MSIE also recognize backquotes for some reason.
-Enabling this attribute provide compatiblity with this behaviour.
+attribute values.  MSIE also recognizes backquotes for some reason.
+Enabling this attribute provides compatibility with this behaviour.
 
 =item $p->boolean_attribute_value( $val )
 
@@ -1200,7 +1200,7 @@
 The parser can process raw undecoded UTF-8 sanely if the C<utf8_mode>
 is enabled or if the "attr", "@attr" or "dtext" argspecs is avoided.
 
-=item Parsing string decoded with wrong endianess
+=item Parsing string decoded with wrong endianness
 
 (W) The first character in the document is U+FFFE.  This is not a
 legal Unicode character but a byte swapped BOM.  The result of parsing

Modified: branches/upstream/libhtml-parser-perl/current/Parser.xs
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/Parser.xs?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/Parser.xs (original)
+++ branches/upstream/libhtml-parser-perl/current/Parser.xs Mon Feb  9 22:12:32 2009
@@ -96,6 +96,10 @@
    #define DOWARN (PL_dowarn & G_WARN_ON)
 #else
    #define DOWARN PL_dowarn
+#endif
+
+#ifndef CLONEf_JOIN_IN
+   #define CLONEf_JOIN_IN 0
 #endif
 
 /*

Modified: branches/upstream/libhtml-parser-perl/current/TODO
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/TODO?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/TODO (original)
+++ branches/upstream/libhtml-parser-perl/current/TODO Mon Feb  9 22:12:32 2009
@@ -2,7 +2,7 @@
 
  - limit the length of markup elements that never end.   Perhaps by
    configurable limits on the length that markup can have and still
-   be recongnized.  Report stuff as 'text' when this happens?
+   be recognized.  Report stuff as 'text' when this happens?
  - remove 255 char limit on literal argspec strings
  - implement backslash escapes in literal argspec string
  - <![%app1;[...]]> (parameter entities)

Modified: branches/upstream/libhtml-parser-perl/current/eg/htextsub
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/eg/htextsub?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/eg/htextsub (original)
+++ branches/upstream/libhtml-parser-perl/current/eg/htextsub Mon Feb  9 22:12:32 2009
@@ -1,4 +1,8 @@
 #!/usr/bin/perl -w
+
+# Shows how to mangle all plain  text in an HTML document, using an arbitrary
+# Perl expression. Plain text is all text not within a tag declaration, i.e.
+# not in <p ...>, but possibly between <p> and </p>
 
 use strict;
 my $code = shift || usage();

Modified: branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/lib/HTML/Entities.pm Mon Feb  9 22:12:32 2009
@@ -57,7 +57,7 @@
 single character strings.  If a key has ";" as suffix,
 then occurrences in $string are only expanded if properly terminated
 with ";".  Entities without ";" will be expanded regardless of how
-they are terminated for compatiblity with how common browsers treat
+they are terminated for compatibility with how common browsers treat
 entities in the Latin-1 range.
 
 If $expand_prefix is TRUE then entities without trailing ";" in
@@ -139,7 +139,7 @@
 @EXPORT = qw(encode_entities decode_entities _decode_entities);
 @EXPORT_OK = qw(%entity2char %char2entity encode_entities_numeric);
 
-$VERSION = "3.57";
+$VERSION = "3.60";
 sub Version { $VERSION; }
 
 require HTML::Parser;  # for fast XS implemented decode_entities
@@ -446,6 +446,7 @@
 
 sub encode_entities
 {
+    return undef unless defined $_[0];
     my $ref;
     if (defined wantarray) {
 	my $x = $_[0];

Modified: branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/lib/HTML/HeadParser.pm Mon Feb  9 22:12:32 2009
@@ -13,6 +13,8 @@
  $p->header('Title')          # to access <title>....</title>
  $p->header('Content-Base')   # to access <base href="http://...">
  $p->header('Foo')            # to access <meta http-equiv="Foo" content="...">
+ $p->header('X-Meta-Author')  # to access <meta name="author" content="...">
+ $p->header('X-Meta-Charset') # to access <meta charset="...">
 
 =head1 DESCRIPTION
 
@@ -52,9 +54,18 @@
 
 =item X-Meta-Foo:
 
-All E<lt>meta> elements will initialize headers with the prefix
-"C<X-Meta->" on the name.  If the E<lt>meta> element contains a
-C<http-equiv> attribute, then it will be honored as the header name.
+All E<lt>meta> elements containing a C<name> attribute will result in
+headers using the prefix C<X-Meta-> appended with the value of the
+C<name> attribute as the name of the header, and the value of the
+C<content> attribute as the pushed header value.
+
+E<lt>meta> elements containing a C<http-equiv> attribute will result
+in headers as in above, but without the C<X-Meta-> prefix in the
+header name.
+
+E<lt>meta> elements containing a C<charset> attribute will result in
+an C<X-Meta-Charset> header, using the value of the C<charset>
+attribute as the pushed header value.
 
 =back
 
@@ -76,7 +87,7 @@
 use strict;
 use vars qw($VERSION $DEBUG);
 #$DEBUG = 1;
-$VERSION = "3.59";
+$VERSION = "3.60";
 
 =item $hp = HTML::HeadParser->new
 
@@ -85,7 +96,7 @@
 The object constructor.  The optional $header argument should be a
 reference to an object that implement the header() and push_header()
 methods as defined by the C<HTTP::Headers> class.  Normally it will be
-of some class that isa or delegates to the C<HTTP::Headers> class.
+of some class that is a or delegates to the C<HTTP::Headers> class.
 
 If no $header is given C<HTML::HeadParser> will create an
 C<HTTP::Header> object by itself (initially empty).
@@ -157,7 +168,14 @@
 #                            SCRIPT* & META* & LINK*">
 #
 # <!ELEMENT HEAD O O  (%head.content)>
-
+#
+# From HTML 4.01:
+#
+# <!ENTITY % head.misc "SCRIPT|STYLE|META|LINK|OBJECT">
+# <!ENTITY % head.content "TITLE & BASE?">
+# <!ELEMENT HEAD O O (%head.content;) +(%head.misc;)>
+#
+# Added in HTML 5: noscript, eventsource, command
 
 sub start
 {
@@ -167,8 +185,15 @@
     if ($tag eq 'meta') {
 	my $key = $attr->{'http-equiv'};
 	if (!defined($key) || !length($key)) {
-	    return unless $attr->{'name'};
-	    $key = "X-Meta-\u$attr->{'name'}";
+	    if ($attr->{name}) {
+		$key = "X-Meta-\u$attr->{name}";
+	    } elsif ($attr->{charset}) { # HTML 5 <meta charset="...">
+		$key = "X-Meta-Charset";
+		$self->{header}->push_header($key => $attr->{charset});
+		return;
+	    } else {
+		return;
+	    }
 	}
 	$self->{'header'}->push_header($key => $attr->{content});
     } elsif ($tag eq 'base') {
@@ -178,7 +203,8 @@
 	# This is a non-standard header.  Perhaps we should just ignore
 	# this element
 	$self->{'header'}->push_header(Isindex => $attr->{prompt} || '?');
-    } elsif ($tag =~ /^(?:title|script|style)$/) {
+    } elsif ($tag =~ /^(?:title|(?:no)?script|style|object
+		      |eventsource|command)$/x) {
 	# Just remember tag.  Initialize header when we see the end tag.
 	$self->{'tag'} = $tag;
     } elsif ($tag eq 'link') {

Modified: branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm (original)
+++ branches/upstream/libhtml-parser-perl/current/lib/HTML/LinkExtor.pm Mon Feb  9 22:12:32 2009
@@ -2,7 +2,7 @@
 
 require HTML::Parser;
 @ISA = qw(HTML::Parser);
-$VERSION = "3.57";
+$VERSION = "3.60";
 
 =head1 NAME
 
@@ -104,7 +104,7 @@
 =item $p->links
 
 Returns a list of all links found in the document.  The returned
-values will be anonymous arrays with the follwing elements:
+values will be anonymous arrays with the following elements:
 
   [$tag, $attr => $url1, $attr2 => $url2,...]
 
@@ -155,7 +155,7 @@
   }
 
   # Make the parser.  Unfortunately, we don't know the base yet
-  # (it might be diffent from $url)
+  # (it might be different from $url)
   $p = HTML::LinkExtor->new(\&callback);
 
   # Request document and parse it as it arrives

Modified: branches/upstream/libhtml-parser-perl/current/t/headparser.t
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-parser-perl/current/t/headparser.t?rev=30534&op=diff
==============================================================================
--- branches/upstream/libhtml-parser-perl/current/t/headparser.t (original)
+++ branches/upstream/libhtml-parser-perl/current/t/headparser.t Mon Feb  9 22:12:32 2009
@@ -1,7 +1,7 @@
 #!perl -w
 
 use strict;
-use Test::More tests => 13;
+use Test::More tests => 15;
 
 { package H;
   sub new { bless {}, shift; }
@@ -55,10 +55,14 @@
     ignore this
 
 </script>
+<noscript> ... and this </noscript>
+
+<object classid="foo">
 
 <base href="http://www.sn.no">
 <meta name="Keywords" content="test, test, test,...">
 <meta name="Keywords" content="more">
+<meta charset="ISO-8859-1"><!-- HTML 5 -->
 
 Dette er vanlig tekst.  Denne teksten definerer også slutten på
 &lt;head> delen av dokumentet.
@@ -91,6 +95,8 @@
 like($p->header('Title'), qr/Å være eller å ikke være/);
 is($p->header('Expires'), 'Soon');
 is($p->header('Content-Base'), 'http://www.sn.no');
+is_deeply($p->header('X-Meta-Keywords'), ['test, test, test,...', 'more']);
+is($p->header('X-Meta-Charset'), 'ISO-8859-1');
 like($p->header('Link'), qr/<mailto:gisle\@aas.no>/);
 
 # This header should not be present because the head ended




More information about the Pkg-perl-cvs-commits mailing list