r35241 - in /branches/upstream/libhtml-wikiconverter-markdown-perl/current: Changes MANIFEST META.yml Makefile.PL README lib/HTML/WikiConverter/Markdown.pm t/01-markdown.t t/markdown.t

eloy at users.alioth.debian.org eloy at users.alioth.debian.org
Tue May 12 10:35:27 UTC 2009


Author: eloy
Date: Tue May 12 10:35:22 2009
New Revision: 35241

URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=35241
Log:
[svn-upgrade] Integrating new upstream version, libhtml-wikiconverter-markdown-perl (0.05)

Added:
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/t/01-markdown.t
Removed:
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/t/markdown.t
Modified:
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/Changes
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/MANIFEST
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/META.yml
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/Makefile.PL
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/README
    branches/upstream/libhtml-wikiconverter-markdown-perl/current/lib/HTML/WikiConverter/Markdown.pm

Modified: branches/upstream/libhtml-wikiconverter-markdown-perl/current/Changes
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-wikiconverter-markdown-perl/current/Changes?rev=35241&op=diff
==============================================================================
--- branches/upstream/libhtml-wikiconverter-markdown-perl/current/Changes (original)
+++ branches/upstream/libhtml-wikiconverter-markdown-perl/current/Changes Tue May 12 10:35:22 2009
@@ -1,4 +1,18 @@
 # Revision history for HTML::WikiConverter::Markdown
+
+date: 2009-03-16
+version: 0.05
+changes:
+  - requires HTML::WikiConverter 0.67
+  - (bug #43997) properly handles multiline code blocks
+
+date: 2009-03-13
+version: 0.04
+changes:
+  - correct handling of blockquotes containing only phrasal elements
+  - (bug #43988) properly escape backticks within code tags
+  - (bug #43993) don't escape underscores within code tags
+  - (bug #43996) decode specific html entities within code tags
 
 date: 2008-11-14
 version: 0.03

Modified: branches/upstream/libhtml-wikiconverter-markdown-perl/current/MANIFEST
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-wikiconverter-markdown-perl/current/MANIFEST?rev=35241&op=diff
==============================================================================
--- branches/upstream/libhtml-wikiconverter-markdown-perl/current/MANIFEST (original)
+++ branches/upstream/libhtml-wikiconverter-markdown-perl/current/MANIFEST Tue May 12 10:35:22 2009
@@ -6,7 +6,7 @@
 lib/HTML/WikiConverter/Markdown.pm
 t/00-load.t
 t/boilerplate.t
-t/markdown.t
+t/01-markdown.t
 t/pod-coverage.t
 t/pod.t
 t/runtests.pl

Modified: branches/upstream/libhtml-wikiconverter-markdown-perl/current/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-wikiconverter-markdown-perl/current/META.yml?rev=35241&op=diff
==============================================================================
--- branches/upstream/libhtml-wikiconverter-markdown-perl/current/META.yml (original)
+++ branches/upstream/libhtml-wikiconverter-markdown-perl/current/META.yml Tue May 12 10:35:22 2009
@@ -1,6 +1,6 @@
 --- #YAML:1.0
 name:               HTML-WikiConverter-Markdown
-version:            0.03
+version:            0.05
 abstract:           Convert HTML to Markdown markup
 author:
     - David J. Iberri <diberri at cpan.org>
@@ -10,7 +10,7 @@
     ExtUtils::MakeMaker:  0
 requires:
     HTML::Tagset:         0
-    HTML::WikiConverter:  0.63
+    HTML::WikiConverter:  0.67
     Params::Validate:     0
     Test::More:           0
     URI:                  0

Modified: branches/upstream/libhtml-wikiconverter-markdown-perl/current/Makefile.PL
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-wikiconverter-markdown-perl/current/Makefile.PL?rev=35241&op=diff
==============================================================================
--- branches/upstream/libhtml-wikiconverter-markdown-perl/current/Makefile.PL (original)
+++ branches/upstream/libhtml-wikiconverter-markdown-perl/current/Makefile.PL Tue May 12 10:35:22 2009
@@ -11,7 +11,7 @@
     PL_FILES            => {},
     PREREQ_PM => {
         'Test::More' => 0,
-        'HTML::WikiConverter' => 0.63,
+        'HTML::WikiConverter' => 0.67, # for p_strict attribute
         'HTML::Tagset' => 0,
         'Params::Validate' => 0,
         'URI' => 0,

Modified: branches/upstream/libhtml-wikiconverter-markdown-perl/current/README
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-wikiconverter-markdown-perl/current/README?rev=35241&op=diff
==============================================================================
--- branches/upstream/libhtml-wikiconverter-markdown-perl/current/README (original)
+++ branches/upstream/libhtml-wikiconverter-markdown-perl/current/README Tue May 12 10:35:22 2009
@@ -1,8 +1,8 @@
 HTML::WikiConverter::Markdown
 =============================
 
-This module adds HTML->Markdown conversion to the HTML::WikiConverter
-module.
+This module adds HTML-to-Markdown conversion to the
+HTML::WikiConverter module.
 
 SYNOPSIS
 
@@ -19,12 +19,6 @@
 There's also a web interface if you're so inclined:
 
   http://diberri.dyndns.org/wikipedia/html2wiki/
-
-DEPENDENCIES
-
-  * HTML::WikiConverter version 0.60
-  * HTML::Tagset
-  * URI
 
 INSTALLATION
 
@@ -57,7 +51,7 @@
 
 COPYRIGHT AND LICENCE
 
-Copyright (C) 2006 David J. Iberri
+Copyright (c) David J. Iberri
 
 This program is free software; you can redistribute it and/or modify it
 under the same terms as Perl itself.

Modified: branches/upstream/libhtml-wikiconverter-markdown-perl/current/lib/HTML/WikiConverter/Markdown.pm
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-wikiconverter-markdown-perl/current/lib/HTML/WikiConverter/Markdown.pm?rev=35241&op=diff
==============================================================================
--- branches/upstream/libhtml-wikiconverter-markdown-perl/current/lib/HTML/WikiConverter/Markdown.pm (original)
+++ branches/upstream/libhtml-wikiconverter-markdown-perl/current/lib/HTML/WikiConverter/Markdown.pm Tue May 12 10:35:22 2009
@@ -4,9 +4,10 @@
 use strict;
 
 use base 'HTML::WikiConverter';
-our $VERSION = '0.03';
+our $VERSION = '0.05';
 
 use Params::Validate ':types';
+use HTML::Entities;
 use HTML::Tagset;
 use URI;
 
@@ -94,9 +95,15 @@
   image_tag_fallback        => { default => 1, type => BOOLEAN },
   unordered_list_style      => { default => 'asterisk', type => SCALAR },
   ordered_list_style        => { default => 'sequential', type => SCALAR },
+
+  # Requires H::WC version 0.67
+  p_strict                  => { default => 0 },
 } }
 
 my @common_attrs = qw/ id class lang dir title style /;
+
+# Hack to accommodate bug #43997 - multiline code blocks
+my $code_block_prefix = 'bqwegsdfbwegadfbnsdfbahwerfgkjnsdfbohqw34t927398y5jnwrteb8uq34inb';
 
 sub rules {
   my $self = shift;
@@ -114,7 +121,8 @@
     em => { alias => 'i' },
     b => { start => '**', end => '**' },
     strong => { alias => 'b' },
-    code => { start => '`', end => '`' },
+    code => { start => \&_code_delim, end => \&_code_delim },
+    code_block => { line_prefix => $code_block_prefix, block => 1 },
 
     a => { replace => \&_link },
     img => { replace => \&_img },
@@ -258,36 +266,131 @@
   my( $self, $node ) = @_;
   return unless $node->tag and $node->parent and $node->parent->tag;
 
-  if( $node->parent->tag eq 'blockquote' and $self->_is_phrase_tag($node->tag) ) {
-    $self->_envelop_elem( $node, HTML::Element->new('p') );
+  if( $node->tag eq 'blockquote' ) {
+    my @non_phrasal_children = grep { ! $self->_is_phrase_tag($_->tag) } $node->content_list;
+    unless( @non_phrasal_children ) { # ie, we have things like <blockquote>blah blah blah</blockquote>, without a <p> or something
+      $self->_envelop_children( $node, HTML::Element->new('p') );
+    }
   } elsif( $node->tag eq '~text' ) {
     $self->_escape_text($node);
-  }
-}
-
-sub _envelop_elem {
-  my( $self, $node, $new_parent ) = @_;
-  my $h = $node->replace_with($new_parent);
-  $new_parent->push_content($h);
-}
-
-my @escapes = qw( \\ \` * _ { } ); # '#', '.', '[', and '!' are handled specially
+
+    # bug #43998
+    $self->_decode_entities_in_code($node)
+      if $node->parent->tag eq 'code' or $node->parent->tag eq 'code_block';
+  }
+}
+
+sub preprocess_tree {
+  my( $self, $root ) = @_;
+  foreach my $node ( $root->descendants ) {
+    # bug #43997 - multiline code blocks
+    if( $self->_text_is_within_code_pre($node) ) {
+      $self->_convert_to_code_block($node);
+    }
+  }
+}
+
+sub _text_is_within_code_pre {
+  my( $self, $node ) = @_;
+  return unless $node->parent->parent and $node->parent->parent->tag;
+
+  # Must be <code><pre>...</pre></code> (or <pre><code>...</code></pre>)
+  my $code_pre = $node->parent->tag eq 'code' && $node->parent->parent->tag eq 'pre';
+  my $pre_code = $node->parent->tag eq 'pre'  && $node->parent->parent->tag eq 'code';
+  return unless $code_pre or $pre_code;
+
+  # Can't be any other nodes in a code block
+  return if $node->left or $node->right;
+  return if $node->parent->left or $node->parent->right;
+
+  return 1;
+}
+
+sub _convert_to_code_block {
+  my( $self, $node ) = @_;
+  $node->parent->parent->replace_with_content->delete;
+  $node->parent->tag( "code_block" );
+}
+
+sub _envelop_children {
+  my( $self, $node, $new_child ) = @_;
+
+  my @children = $node->detach_content;
+  $node->push_content($new_child);
+  $new_child->push_content(@children);
+}
+
+# special handling for: ` _ # . [ !
+my @escapes = qw( \\ * { } _ ` );
+
+my %backslash_escapes = (
+  '\\' => [ '0923fjhtml2wikiescapedbackslash',  "\\\\" ],
+  '*'  => [ '0923fjhtml2wikiescapedasterisk',   "\\*"  ],
+  '{'  => [ '0923fjhtml2wikiescapedopenbrace',  "\\{"  ],
+  '}'  => [ '0923fjhtml2wikiescapedclosebrace', "\\}"  ],
+  '_'  => [ '0923fjhtml2wikiescapedunderscore', "\\_"  ],
+  '`'  => [ '0923fjhtml2wikiescapedbacktick',   "\\`"  ],
+);
 
 sub _escape_text {
   my( $self, $node ) = @_;
   my $text = $node->attr('text') || '';
-  my $escapes = join '', @escapes;
-  $text =~ s/([\Q$escapes\E])/\\$1/g;
-  $text =~ s/^([\d]+)\./$1\\./;
-  $text =~ s/^\#/\\#/;
-  $text =~ s/\!\[/\\![/g;
-  $text =~ s/\]\[/]\\[/g;
+
+  #
+  # (bug #43998)
+  # Only backslash-escape backticks that don't occur within <code>
+  # tags. Those within <code> tags are left alone and the backticks to
+  # signal a <code> tag get upgraded to a double-backtick by
+  # _code_delim().
+  #
+  # (bug #43993)
+  # Likewise, only backslash-escape underscores that occur outside
+  # <code> tags.
+  #
+
+  my $inside_code = $node->look_up( _tag => 'code' ) || $node->look_up( _tag => 'code_block' );
+
+  if( not $inside_code ) {
+    my $escapes = join '', @escapes;
+    $text =~ s/([\Q$escapes\E])/$backslash_escapes{$1}->[0]/g;
+    $text =~ s/^([\d]+)\./$1\\./;
+    $text =~ s/^\#/\\#/;
+    $text =~ s/\!\[/\\![/g;
+    $text =~ s/\]\[/]\\[/g;
+
+    $node->attr( text => $text );
+  }
+}
+
+# bug #43998
+sub _code_delim {
+  my( $self, $node, $rules ) = @_;
+  my $contents = $self->get_elem_contents($node);
+  return $contents =~ /\`/ ? '``' : '`';
+}
+
+# bug #43996
+sub _decode_entities_in_code {
+  my( $self, $node ) = @_;
+  my $text = $node->attr('text') || '';
+  return unless $text;
+
+  HTML::Entities::_decode_entities( $text, { 'amp' => '&', 'lt' => '<', 'gt' => '>' } );
   $node->attr( text => $text );
 }
 
 sub postprocess_output {
   my( $self, $outref ) = @_;
+  $$outref =~ s/\Q$code_block_prefix\E/    /gm;
+  $self->_unescape_text($outref);
   $self->_add_references($outref);
+}
+
+sub _unescape_text {
+  my( $self, $outref ) = @_;
+  foreach my $escape ( values %backslash_escapes ) {
+    $$outref =~ s/$escape->[0]/$escape->[1]/g;
+  }
 }
 
 sub _add_references {

Added: branches/upstream/libhtml-wikiconverter-markdown-perl/current/t/01-markdown.t
URL: http://svn.debian.org/wsvn/pkg-perl/branches/upstream/libhtml-wikiconverter-markdown-perl/current/t/01-markdown.t?rev=35241&op=file
==============================================================================
--- branches/upstream/libhtml-wikiconverter-markdown-perl/current/t/01-markdown.t (added)
+++ branches/upstream/libhtml-wikiconverter-markdown-perl/current/t/01-markdown.t Tue May 12 10:35:22 2009
@@ -1,0 +1,459 @@
+use HTML::WikiConverter;
+
+local $/;
+require 't/runtests.pl';
+runtests( data => <DATA>, dialect => 'Markdown', wiki_uri => 'http://www.test.com/wiki/' );
+close DATA;
+
+__DATA__
+unordered list
+__H__
+<ul>
+<li>one</li>
+<li>two</li>
+<li>three</li>
+</ul>
+__W__
+* one
+* two
+* three
+__NEXT__
+ordered list
+__H__
+<ol>
+<li>one</li>
+<li>two</li>
+<li>three</li>
+</ol>
+__W__
+1. one
+2. two
+3. three
+__NEXT__
+blockquote
+__H__
+<blockquote>text</blockquote>
+__W__
+> text
+__NEXT__
+nested blockquote
+__H__
+<blockquote>text<blockquote>text2</blockquote></blockquote>
+__W__
+> text
+>
+> > text2
+__NEXT__
+nested blockquote cont'd
+__H__
+<blockquote>This is the first level of quoting.
+<blockquote>This is nested blockquote.</blockquote>
+<p>Back to the first level.</p></blockquote>
+__W__
+> This is the first level of quoting.
+>
+> > This is nested blockquote.
+>
+> Back to the first level.
+__NEXT__
+h1
+__H__
+<h1>text</h1>
+__W__
+# text
+__NEXT__
+bold
+__H__
+<b>bold text</b>
+__W__
+**bold text**
+__NEXT__
+italics
+__H__
+<i>text</i>
+__W__
+_text_
+__NEXT__
+strong
+__H__
+<strong>text</strong>
+__W__
+**text**
+__NEXT__
+em
+__H__
+<em>text</em>
+__W__
+_text_
+__NEXT__
+inline link ::link_style('inline')
+__H__
+<p>It's called <a href="http://en.wikipedia.org/wiki/Long-term_potentiation" title="Long-term potentiation">LTP</a>.</p>
+__W__
+It's called [LTP](http://en.wikipedia.org/wiki/Long-term_potentiation Long-term potentiation).
+__NEXT__
+reference link ::link_style('reference')
+__H__
+<p>It's called <a href="http://en.wikipedia.org/wiki/Long-term_potentiation" title="Long-term potentiation">LTP</a>.</p>
+__W__
+It's called [LTP][1].
+
+  [1]: http://en.wikipedia.org/wiki/Long-term_potentiation "Long-term potentiation"
+__NEXT__
+reference link no title ::link_style('inline')
+__H__
+<p><a href="http://example.net/">This link</a> has no title attribute.</p>
+__W__
+[This link](http://example.net/) has no title attribute.
+__NEXT__
+multi-paragraphs with reference links ::link_style('reference')
+__H__
+<p>This is a paragraph with a link to <a href="http://google.com">Google</a>.
+There's also a link to some other stuff, like <a href="http://digg.com">Digg</a>
+and <a href="http://wikipedia.org">Wikipedia</a>.
+
+<p>Here's another paragraph.</p>
+
+<p>This is fun stuff:</p>
+<ul>
+  <li><a href="http://video.google.com" title="Google Video">Google Video is the best!</a></li>
+  <li><a href="http://www.example.org" title="Examples">Example.org is a close second</a></li>
+</ul>
+__W__
+This is a paragraph with a link to [Google][1]. There's also a link to some other stuff, like [Digg][2] and [Wikipedia][3].
+
+Here's another paragraph.
+
+This is fun stuff:
+
+* [Google Video is the best!][4]
+* [Example.org is a close second][5]
+
+  [1]: http://google.com
+  [2]: http://digg.com
+  [3]: http://wikipedia.org
+  [4]: http://video.google.com "Google Video"
+  [5]: http://www.example.org "Examples"
+__NEXT__
+code
+__H__
+<code>printf()</code>
+__W__
+`printf()`
+__NEXT__
+inline image ::image_style('inline')
+__H__
+<img src="http://example.com/delete.png" alt="Delete" title="Click to delete" />
+__W__
+![Delete](http://example.com/delete.png "Click to delete")
+__NEXT__
+reference image ::image_style('reference')
+__H__
+<img src="http://example.com/delete.png" alt="Delete" title="Click to delete" />
+__W__
+![Delete][1]
+
+  [1]: http://example.com/delete.png "Click to delete"
+__NEXT__
+mixed inline images and links ::image_style('inline') ::link_style('inline')
+__H__
+<p>Link goes <a href="http://example.com" title="Link to example.com">Here</a>.
+Image goes below:</p>
+
+<p><img src="http://example.com/logo.png" alt="Logo"/></p>
+__W__
+Link goes [Here](http://example.com Link to example.com). Image goes below:
+
+![Logo](http://example.com/logo.png)
+__NEXT__
+mixed reference images and links ::image_style('reference') ::link_style('reference')
+__H__
+<p>This is a paragraph with a link to <a href="http://google.com">Google</a>.  There's also a link to some other stuff, like <a href="http://digg.com">Digg</a> and <a href="http://wikipedia.org">Wikipedia</a>. <img src="http://example.com/delete.png" alt="Delete" title="Click to delete" /></p>
+__W__
+This is a paragraph with a link to [Google][1]. There's also a link to some other stuff, like [Digg][2] and [Wikipedia][3]. ![Delete][4]
+
+  [1]: http://google.com
+  [2]: http://digg.com
+  [3]: http://wikipedia.org
+  [4]: http://example.com/delete.png "Click to delete"
+__NEXT__
+fallback to tag if image has dimensions ::image_tag_fallback(1)
+__H__
+<img src="http://example.com/origin.png" alt="Thingy" title="The title" width="100" />
+__W__
+<img src="http://example.com/origin.png" width="100" alt="Thingy" title="The title" />
+__NEXT__
+no fallback ::image_tag_fallback(0) ::image_style('inline')
+__H__
+<img src="http://example.com/origin.png" alt="Thingy" title="The title" width="100" />
+__W__
+![Thingy](http://example.com/origin.png "The title")
+__NEXT__
+automatic links
+__H__
+<a href="http://example.com">http://example.com</a>
+__W__
+<http://example.com>
+__NEXT__
+escapes
+__H__
+<p>a backslash \</p>
+<p>a weird combo ![</p>
+<p>a curly brace {</p>
+<p>1992. not a list item!</p>
+__W__
+a backslash \\
+
+a weird combo \![
+
+a curly brace \{
+
+1992\. not a list item!
+__NEXT__
+multi-headers
+__H__
+<h1>One</h1>
+<h2>Two</h2>
+<h3>Three</h3>
+__W__
+# One
+
+## Two
+
+### Three
+__NEXT__
+one-dot lists ::ordered_list_style('one-dot')
+__H__
+<ol>
+<li>one</li>
+<li>two</li>
+<li>three</li>
+</ol>
+__W__
+1. one
+1. two
+1. three
+__NEXT__
+plus lists ::unordered_list_style('plus')
+__H__
+<ul>
+<li>one</li>
+<li>two</li>
+<li>three</li>
+</ul>
+__W__
++ one
++ two
++ three
+__NEXT__
+dash lists ::unordered_list_style('dash')
+__H__
+<ul>
+<li>one</li>
+<li>two</li>
+<li>three</li>
+</ul>
+__W__
+- one
+- two
+- three
+__NEXT__
+forced inline anchors ::force_inline_anchor_links(1) ::unordered_list_style('asterisk')
+__H__
+<ul>
+  <li><a href="#overview">Overview</a>
+    <ul>
+      <li><a href="#philosophy">Philosophy</a></li>
+      <li><a href="#html">Inline HTML</a></li>
+    </ul>
+  </li>
+</ul>
+__W__
+* [Overview](#overview)
+  * [Philosophy](#philosophy)
+  * [Inline HTML](#html)
+__NEXT__
+table
+__H__
+<table>
+  <caption>My favorite animals</caption>
+  <tr>
+    <th>Animal</th>
+    <th>Region</th>
+    <th>Physical traits</th>
+    <th>Food</th>
+  </tr>
+  <tr>
+    <td>Pacman frog</td>
+    <td>Gran Chaco (Argentina)</td>
+    <td>Half mouth, half stomach (quite literally!)</td>
+    <td>Crickets, fish, etc.</td>
+  </tr>
+</table>
+__W__
+<table>
+<caption>My favorite animals</caption>
+<tr>
+<th>Animal</th>
+<th>Region</th>
+<th>Physical traits</th>
+<th>Food</th>
+</tr>
+<tr>
+<td>Pacman frog</td>
+<td>Gran Chaco (Argentina)</td>
+<td>Half mouth, half stomach (quite literally!)</td>
+<td>Crickets, fish, etc.</td>
+</tr>
+</table>
+__NEXT__
+setext header ::header_style('setext')
+__H__
+<h1>header1</h1>
+<p>Fun stuff here.</p>
+<h2>header2</h2>
+<p>More fun stuff!</p>
+__W__
+header1
+=======
+
+Fun stuff here.
+
+header2
+-------
+
+More fun stuff!
+__NEXT__
+more complete example ::header_style('atx')
+__H__
+<h2>Aaron Swartz's html2text</h2>
+
+<p>A handful of people have asked if there's a way to translate Markdown
+in reverse — to turn HTML back into Markdown-formatted plain text.
+The short answer is yes, by using Aaron Swartz's new version of
+<a href="http://www.aaronsw.com/2002/html2text/">html2text</a>:</p>
+
+<blockquote>
+  <p>html2text is a Python script that convers a page of HTML into clean,
+  easy-to-read plain ASCII. Better yet, that ASCII also happens to be
+  valid Markdown (a text-to-HTML format).</p>
+</blockquote>
+
+<p>html2text works so well that I'm planning to use it to convert most of
+my old Daring Fireball articles (the ones I wrote in raw HTML). It's
+worth noting that if you start with a Markdown document, translate it
+to HTML, then use html2text to go back to Markdown, it won't give you
+the exact same document you started with. That sort of complete
+round-trip fidelity simply is not possible, but html2text comes pretty
+close.</p>
+
+<p>Also, much like Markdown and SmartyPants, html2text works as a BBEdit
+text filter. Simply save a copy in the Unix Filters folder in your
+BBEdit Support folder.</p>
+__W__
+## Aaron Swartz's html2text
+
+A handful of people have asked if there's a way to translate Markdown in reverse — to turn HTML back into Markdown-formatted plain text. The short answer is yes, by using Aaron Swartz's new version of [html2text][1]:
+
+> html2text is a Python script that convers a page of HTML into clean, easy-to-read plain ASCII. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
+
+html2text works so well that I'm planning to use it to convert most of my old Daring Fireball articles (the ones I wrote in raw HTML). It's worth noting that if you start with a Markdown document, translate it to HTML, then use html2text to go back to Markdown, it won't give you the exact same document you started with. That sort of complete round-trip fidelity simply is not possible, but html2text comes pretty close.
+
+Also, much like Markdown and SmartyPants, html2text works as a BBEdit text filter. Simply save a copy in the Unix Filters folder in your BBEdit Support folder.
+
+  [1]: http://www.aaronsw.com/2002/html2text/
+__NEXT__
+blockquotes containing only phrasal elements
+__H__
+<p>Via <a href="http://en.wikipedia.org/wiki/Long-term_potentiation">Wikipedia</a>:</p>
+<blockquote>Long-term potentiation is the long-lasting enhancement in communication between two <a href="http://en.wikipedia.org/wiki/Neuron">neurons</a> that lasts from minutes to hours.</blockquote>
+<p>Sweet.</p>
+__W__
+Via [Wikipedia][1]:
+
+> Long-term potentiation is the long-lasting enhancement in communication between two [neurons][2] that lasts from minutes to hours.
+
+Sweet.
+
+  [1]: http://en.wikipedia.org/wiki/Long-term_potentiation
+  [2]: http://en.wikipedia.org/wiki/Neuron
+__NEXT__
+blockquote containing p
+__H__
+<blockquote><p>shouldn't add a paragraph parent</p></blockquote>
+__W__
+> shouldn't add a paragraph parent
+__NEXT__
+__H__
+<blockquote>unmarked paragraph <p>another paragraph</p> <p>yet another</p></blockquote>
+__W__
+> unmarked paragraph
+>
+> another paragraph
+>
+> yet another
+__NEXT__
+code containing backticks (bug #43998)
+__H__
+<p><code>There is a literal backtick (`) here.</code></p>
+__W__
+``There is a literal backtick (`) here.``
+__NEXT__
+amp, lt, gt within code blocks (bug #43996)
+__H__
+<code>print("a &lt; b") if $c > $d</code>
+__W__
+`print("a < b") if $c > $d`
+__NEXT__
+amp, lt, gt within code blocks (bug #43996, example from markdown docs, http://bit.ly/NSrG3)
+__H__
+<p>I strongly recommend against using any
+<code>&lt;blink&gt;</code> tags.</p>
+
+<p>I wish SmartyPants used named entities like <code>&amp;mdash;</code> instead of decimal-encoded entites like <code>&amp;#8212;</code>.</p>
+__W__
+I strongly recommend against using any `<blink>` tags.
+
+I wish SmartyPants used named entities like `&mdash;` instead of decimal-encoded entites like `&#8212;`.
+__NEXT__
+escape literal backticks outside of <code> tags
+__H__
+<p>Hi there, this is a backtick (`).</p>
+__W__
+Hi there, this is a backtick (\`).
+__NEXT__
+don't backslash-escape underscores within <code> tags (bug #43993)
+__H__
+<code>foo _bar_ baz foo_bar</code>
+__W__
+`foo _bar_ baz foo_bar`
+__NEXT__
+but do backslash-escape other underscores
+__H__
+<p>foo _bar_</p>
+__W__
+foo \_bar\_
+__NEXT__
+code blocks
+__H__
+<p>Here's an example:</p>
+
+<code><pre>if( chomp( my $foo = <> ) ) {
+  print "entered: $foo\n";
+}</pre></code>
+__W__
+Here's an example:
+
+    if( chomp( my $foo = <> ) ) {
+      print "entered: $foo\n";
+    }
+__NEXT__
+code blocks
+__H__
+<code><pre>if( chomp( my $foo = <> ) ) {
+  print "entered: $foo\n";
+}</pre></code>
+__W__
+    if( chomp( my $foo = <> ) ) {
+      print "entered: $foo\n";
+    }




More information about the Pkg-perl-cvs-commits mailing list