r5466 - in /packages/libmarc-charset-perl/trunk: Changes MANIFEST META.yml README debian/changelog debian/control lib/MARC/Charset.pm lib/MARC/Charset/Constants.pm t/entities.t t/escape1.t t/escape2.t t/marc8_to_utf8.t t/utf8.t
gregoa-guest at users.alioth.debian.org
gregoa-guest at users.alioth.debian.org
Fri May 18 22:53:45 UTC 2007
Author: gregoa-guest
Date: Fri May 18 22:53:45 2007
New Revision: 5466
URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=5466
Log:
* New upstream release.
* Set Standards-Version to 3.7.2 (no changes).
Added:
packages/libmarc-charset-perl/trunk/t/marc8_to_utf8.t
- copied unchanged from r5465, packages/libmarc-charset-perl/branches/upstream/current/t/marc8_to_utf8.t
Removed:
packages/libmarc-charset-perl/trunk/t/entities.t
Modified:
packages/libmarc-charset-perl/trunk/Changes
packages/libmarc-charset-perl/trunk/MANIFEST
packages/libmarc-charset-perl/trunk/META.yml
packages/libmarc-charset-perl/trunk/README
packages/libmarc-charset-perl/trunk/debian/changelog
packages/libmarc-charset-perl/trunk/debian/control
packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm
packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm
packages/libmarc-charset-perl/trunk/t/escape1.t
packages/libmarc-charset-perl/trunk/t/escape2.t
packages/libmarc-charset-perl/trunk/t/utf8.t
Modified: packages/libmarc-charset-perl/trunk/Changes
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/Changes?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/Changes (original)
+++ packages/libmarc-charset-perl/trunk/Changes Fri May 18 22:53:45 2007
@@ -1,8 +1,14 @@
Revision history for MARC::Charset
-0.95 Tue Feb 7 11:38:05 EST 2006
- - bugfix in combining character handling (thanks Mike Rylander)
- - added t/entities.t
+0.96 Wed Mar 14 01:24:48 EDT 2007
+ - added ignore_errors() to skip MARC8 -> UTF8 snafus
+ - added assume_encoding() to treat transcoding failures as if they
+ are from a known, specific encoding. Useful if you have a set of
+ records that, for instance, report being MARC8 but are actually
+ encoded in Latin1 (which, btw, is completely invalid and also very
+ common). Only in effect when ignore_errors() is true.
+ - added assume_unicode() to treat invalid MARC8 as UTF8. This is a
+ convenience function based on assume_encoding().
0.92 Sat Feb 4 19:34:19 CST 2006
- marc8_to_utf8 and utf8_to_marc8 needed to pass along spaces
Modified: packages/libmarc-charset-perl/trunk/MANIFEST
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/MANIFEST?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/MANIFEST (original)
+++ packages/libmarc-charset-perl/trunk/MANIFEST Fri May 18 22:53:45 2007
@@ -8,7 +8,7 @@
lib/MARC/Charset/Constants.pm
lib/MARC/Charset/Table.pm
Makefile.PL
-MANIFEST
+MANIFEST This list of files
META.yml
README
t/cjk.t
@@ -19,7 +19,6 @@
t/code.t
t/cyrillic.marc
t/decompose.t
-t/entities.t
t/escape1.t
t/escape2.t
t/hebrew1.marc
Modified: packages/libmarc-charset-perl/trunk/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/META.yml?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/META.yml (original)
+++ packages/libmarc-charset-perl/trunk/META.yml Fri May 18 22:53:45 2007
@@ -1,7 +1,7 @@
# http://module-build.sourceforge.net/META-spec.html
#XXXXXXX This is a prototype!!! It will change in the future!!! XXXXX#
name: MARC-Charset
-version: 0.95
+version: 0.96
version_from: lib/MARC/Charset.pm
installdirs: site
requires:
Modified: packages/libmarc-charset-perl/trunk/README
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/README?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/README (original)
+++ packages/libmarc-charset-perl/trunk/README Fri May 18 22:53:45 2007
@@ -21,7 +21,7 @@
Unicode notwithstanding, libraries still have a wealth of data encoded using
MARC-8. Yet, some new data formats such as XML require that characters are
encoded using Unicode. In order to fascilitate conversion the Library of
-Congress graciously published character mappings to enable the conversion
+Congress graciously published character mappings to fascilitate the conversion
of MARC-8 data to Unicode.
MARC::Charset is basically an implementation of the character mappings that
Modified: packages/libmarc-charset-perl/trunk/debian/changelog
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/debian/changelog?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/debian/changelog (original)
+++ packages/libmarc-charset-perl/trunk/debian/changelog Fri May 18 22:53:45 2007
@@ -1,3 +1,10 @@
+libmarc-charset-perl (0.96-1) unstable; urgency=low
+
+ * New upstream release.
+ * Set Standards-Version to 3.7.2 (no changes).
+
+ -- gregor herrmann <gregor+debian at comodo.priv.at> Sat, 19 May 2007 00:53:27 +0200
+
libmarc-charset-perl (0.95-2) unstable; urgency=low
* Fix typo in Description
Modified: packages/libmarc-charset-perl/trunk/debian/control
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/debian/control?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/debian/control (original)
+++ packages/libmarc-charset-perl/trunk/debian/control Fri May 18 22:53:45 2007
@@ -5,7 +5,7 @@
Build-Depends-Indep: perl (>= 5.8.0-7), libxml-sax-perl, libclass-accessor-perl, libtest-pod-perl
Maintainer: Debian Perl Group <pkg-perl-maintainers at lists.alioth.debian.org>
Uploaders: gregor herrmann <gregor+debian at comodo.priv.at>
-Standards-Version: 3.6.2
+Standards-Version: 3.7.2
XS-Vcs-Svn: svn://svn.debian.org/pkg-perl/packages/libmarc-charset-perl/trunk/
Package: libmarc-charset-perl
Modified: packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm (original)
+++ packages/libmarc-charset-perl/trunk/lib/MARC/Charset.pm Fri May 18 22:53:45 2007
@@ -1,6 +1,6 @@
package MARC::Charset;
-our $VERSION = '0.95';
+our $VERSION = '0.96';
use strict;
use warnings;
@@ -8,6 +8,7 @@
our @EXPORT_OK = qw(marc8_to_utf8 utf8_to_marc8);
use Unicode::Normalize;
+use Encode 'decode';
use MARC::Charset::Table;
use MARC::Charset::Constants qw(:all);
@@ -47,6 +48,72 @@
our $DEFAULT_G0 = ASCII_DEFAULT;
our $DEFAULT_G1 = EXTENDED_LATIN;
+=head2 ignore_errors()
+
+Tells MARC::Charset whether or not to ignore all encoding errors, and
+returns the current setting. This is helepfuli if you have records that
+contain both MARC8 and UNICODE characters.
+
+ my $ignore = MARC::Charset->ignore_errors();
+
+ MARC::Charset->ignore_errors(1); # ignore errors
+ MARC::Charset->ignore_errors(0); # DO NOT ignore errors
+
+=cut
+
+
+our $_ignore_errors = 0;
+sub ignore_errors {
+ my ($self,$i) = @_;
+ $_ignore_errors = $i if (defined($i));
+ return $_ignore_errors;
+}
+
+
+=head2 assume_unicode()
+
+Tells MARC::Charset whether or not to assume UNICODE when an error is
+encountered in ignore_errors mode and returns the current setting.
+This is helepfuli if you have records that contain both MARC8 and UNICODE
+characters.
+
+ my $setting = MARC::Charset->assume_unicode();
+
+ MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8)
+ MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode
+
+=cut
+
+
+our $_assume = '';
+sub assume_unicode {
+ my ($self,$i) = @_;
+ $_assume = 'utf8' if (defined($i) and $i);
+ return 1 if ($_assume eq 'utf8');
+}
+
+
+=head2 assume_encoding()
+
+Tells MARC::Charset whether or not to assume a specific encoding when an error
+is encountered in ignore_errors mode and returns the current setting. This
+is helpful if you have records that contain both MARC8 and other characters.
+
+ my $setting = MARC::Charset->assume_encoding();
+
+ MARC::Charset->assume_encoding('cp850'); # assume characters are cp850
+ MARC::Charset->assume_encoding(''); # DO NOT assume any encoding
+
+=cut
+
+
+sub assume_encoding {
+ my ($self,$i) = @_;
+ $_assume = $i if (defined($i));
+ return $_assume;
+}
+
+
# place holders for working graphical character sets
my $G0;
my $G1;
@@ -58,9 +125,15 @@
my $utf8 = marc8_to_utf8($marc8);
If you'd like to ignore errors pass in a true value as the 2nd
-parameter:
+parameter or call MARC::Charset->ignore_errors() with a true
+value:
my $utf8 = marc8_to_utf8($marc8, 'ignore-errors');
+
+ or
+
+ MARC::Charset->ignore_errors(1);
+ my $utf8 = marc8_to_utf8($marc8);
=cut
@@ -70,6 +143,8 @@
my ($marc8, $ignore_errors) = @_;
reset_charsets();
+ $ignore_errors = $_ignore_errors if (!defined($ignore_errors));
+
# holder for our utf8
my $utf8 = '';
@@ -95,14 +170,14 @@
}
my $found;
- CHARSET_LOOP: foreach my $charset ($G0, $G1)
+ CHARSET_LOOP: foreach my $charset ($G0, $G1)
{
# cjk characters are a string of three chars
- my $char_size = $charset eq CJK ? 3 : 1;
+ my $char_size = $charset eq CJK ? 3 : 1;
# extract the next code point to examine
- my $chunk = substr($marc8, $index, $char_size);
+ my $chunk = substr($marc8, $index, $char_size);
# look up the character to see if it's in our mapping
my $code = $table->lookup_by_marc8($charset, $chunk);
@@ -118,7 +193,7 @@
if ($code->is_combining())
{
$combining .= $code->char_value();
- }
+ }
else
{
$utf8 .= $code->char_value() . $combining;
@@ -127,18 +202,23 @@
$index += $char_size;
next CHAR_LOOP;
- }
+ }
if (!$found)
{
- warn("no mapping found at position $index in $marc8 ".
+ warn(sprintf("no mapping found for [0x\%X] at position $index in $marc8 ".
"g0=".MARC::Charset::Constants::charset_name($G0) . " " .
- "g1=".MARC::Charset::Constants::charset_name($G1));
+ "g1=".MARC::Charset::Constants::charset_name($G1), unpack('C',substr($marc8,$index,1))));
if (!$ignore_errors)
{
reset_charsets();
return;
}
+ if ($_assume)
+ {
+ reset_charsets();
+ return NFC(decode($_assume => $marc8));
+ }
$index += 1;
}
@@ -162,6 +242,11 @@
parameter:
my $marc8 = utf8_to_marc8($utf8, 'ignore-errors');
+
+ or
+
+ MARC::Charset->ignore_errors(1);
+ my $utf8 = marc8_to_utf8($marc8);
=cut
@@ -169,6 +254,8 @@
{
my ($utf8, $ignore_errors) = @_;
reset_charsets();
+
+ $ignore_errors = $_ignore_errors if (!defined($ignore_errors));
# decompose combined characters
$utf8 = NFD($utf8);
@@ -334,22 +421,22 @@
}
elsif ( $esc_char_1 eq MULTI_G0_A ) {
- $G0 = $esc_char_2;
+ $G0 = $esc_char_2;
return $left+3;
}
elsif ($esc_chars eq MULTI_G0_B
and ($left+3 < $right))
{
- $G0 = substr($$str_ref, $left+3, 1);
- return $left+4;
+ $G0 = substr($$str_ref, $left+3, 1);
+ return $left+4;
}
elsif (($esc_chars eq MULTI_G1_A or $esc_chars eq MULTI_G1_B)
and ($left + 3 < $right))
{
- $G1 = substr($$str_ref, $left+3, 1);
- return $left+4;
+ $G1 = substr($$str_ref, $left+3, 1);
+ return $left+4;
}
# we should never get here
Modified: packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm (original)
+++ packages/libmarc-charset-perl/trunk/lib/MARC/Charset/Constants.pm Fri May 18 22:53:45 2007
@@ -19,46 +19,46 @@
use warnings;
use base qw( Exporter );
-use constant ESCAPE => chr(0x1B);
+use constant ESCAPE => chr(0x1B);
-use constant SINGLE_G0_A => chr(0x28);
-use constant SINGLE_G0_B => chr(0x2C);
-use constant MULTI_G0_A => chr(0x24);
-use constant MULTI_G0_B => chr(0x24) . chr(0x2C);
+use constant SINGLE_G0_A => chr(0x28);
+use constant SINGLE_G0_B => chr(0x2C);
+use constant MULTI_G0_A => chr(0x24);
+use constant MULTI_G0_B => chr(0x24) . chr(0x2C);
-use constant SINGLE_G1_A => chr(0x29);
-use constant SINGLE_G1_B => chr(0x2D);
-use constant MULTI_G1_A => chr(0x24) . chr(0x29);
-use constant MULTI_G1_B => chr(0x24) . chr(0x2D);
+use constant SINGLE_G1_A => chr(0x29);
+use constant SINGLE_G1_B => chr(0x2D);
+use constant MULTI_G1_A => chr(0x24) . chr(0x29);
+use constant MULTI_G1_B => chr(0x24) . chr(0x2D);
-use constant GREEK_SYMBOLS => chr(0x67);
-use constant SUBSCRIPTS => chr(0x62);
-use constant SUPERSCRIPTS => chr(0x70);
-use constant ASCII_DEFAULT => chr(0x73);
+use constant GREEK_SYMBOLS => chr(0x67);
+use constant SUBSCRIPTS => chr(0x62);
+use constant SUPERSCRIPTS => chr(0x70);
+use constant ASCII_DEFAULT => chr(0x73);
-use constant BASIC_ARABIC => chr(0x33);
-use constant EXTENDED_ARABIC => chr(0x34);
-use constant BASIC_LATIN => chr(0x42);
-use constant EXTENDED_LATIN => chr(0x45);
-use constant CJK => chr(0x31);
-use constant BASIC_CYRILLIC => chr(0x4E);
-use constant EXTENDED_CYRILLIC => chr(0x51);
-use constant BASIC_GREEK => chr(0x53);
-use constant BASIC_HEBREW => chr(0x32);
+use constant BASIC_ARABIC => chr(0x33);
+use constant EXTENDED_ARABIC => chr(0x34);
+use constant BASIC_LATIN => chr(0x42);
+use constant EXTENDED_LATIN => chr(0x45);
+use constant CJK => chr(0x31);
+use constant BASIC_CYRILLIC => chr(0x4E);
+use constant EXTENDED_CYRILLIC => chr(0x51);
+use constant BASIC_GREEK => chr(0x53);
+use constant BASIC_HEBREW => chr(0x32);
our %EXPORT_TAGS = ( all => [ qw(
- ESCAPE GREEK_SYMBOLS SUBSCRIPTS SUPERSCRIPTS ASCII_DEFAULT
- SINGLE_G0_A SINGLE_G0_B MULTI_G0_A MULTI_G0_B SINGLE_G1_A
- SINGLE_G1_B MULTI_G1_A MULTI_G1_B BASIC_ARABIC
- EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK BASIC_CYRILLIC
- EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW) ]);
+ ESCAPE GREEK_SYMBOLS SUBSCRIPTS SUPERSCRIPTS ASCII_DEFAULT
+ SINGLE_G0_A SINGLE_G0_B MULTI_G0_A MULTI_G0_B SINGLE_G1_A
+ SINGLE_G1_B MULTI_G1_A MULTI_G1_B BASIC_ARABIC
+ EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK BASIC_CYRILLIC
+ EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW) ]);
our @EXPORT_OK = qw(
- ESCAPE GREEK_SYMBOLS SUBSCRIPTS SUPERSCRIPTS ASCII_DEFAULT
- SINGLE_G0_A SINGLE_G0_B MULTI_G0_A MULTI_G0_B SINGLE_G1_A
- SINGLE_G1_B MULTI_G1_A MULTI_G1_B BASIC_ARABIC
- EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK BASIC_CYRILLIC
- EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW);
+ ESCAPE GREEK_SYMBOLS SUBSCRIPTS SUPERSCRIPTS ASCII_DEFAULT
+ SINGLE_G0_A SINGLE_G0_B MULTI_G0_A MULTI_G0_B SINGLE_G1_A
+ SINGLE_G1_B MULTI_G1_A MULTI_G1_B BASIC_ARABIC
+ EXTENDED_ARABIC BASIC_LATIN EXTENDED_LATIN CJK BASIC_CYRILLIC
+ EXTENDED_CYRILLIC BASIC_GREEK BASIC_HEBREW);
sub charset_name
{
Modified: packages/libmarc-charset-perl/trunk/t/escape1.t
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/t/escape1.t?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/t/escape1.t (original)
+++ packages/libmarc-charset-perl/trunk/t/escape1.t Fri May 18 22:53:45 2007
@@ -12,9 +12,9 @@
my $test =
'it is all greek ' .
- ESCAPE . GREEK_SYMBOLS . ## escape to Greek Symbols
- chr(0x61) . chr(0x62) . chr(0x63) . ## ALPHA BETA GAMMA
- ESCAPE . ASCII_DEFAULT. ## back to ASCII
+ ESCAPE . GREEK_SYMBOLS . ## escape to Greek Symbols
+ chr(0x61) . chr(0x62) . chr(0x63) . ## ALPHA BETA GAMMA
+ ESCAPE . ASCII_DEFAULT. ## back to ASCII
' to me';
my $expected =
@@ -28,23 +28,26 @@
## Subscripts
$test =
- 'subscript1' .
- ESCAPE . SUBSCRIPTS . ## escape to Subscripts
- chr(0x31) . ## subscript 1
- ESCAPE . ASCII_DEFAULT . ## back to ASCII
- 'subscript9' .
- ESCAPE . SUBSCRIPTS . ## escape to Subscripts
- chr(0x39) . ## subscript 9
- ESCAPE . ASCII_DEFAULT . ## back to ASCII
+ 'subscript1' .
+ ESCAPE . SUBSCRIPTS . ## escape to Subscripts
+ chr(0x31) . ## subscript 1
+ ESCAPE . ASCII_DEFAULT . ## back to ASCII
+ 'subscript9' .
+ ESCAPE . SUBSCRIPTS . ## escape to Subscripts
+ chr(0x39) . ## subscript 9
+ ESCAPE . ASCII_DEFAULT . ## back to ASCII
'subscript10' .
- ESCAPE . SUBSCRIPTS . ## back to Subscripts again
- chr(0x31) . chr(0x30) . ## subscript 10
- ESCAPE . ASCII_DEFAULT; ## back to ASCII
+ ESCAPE . SUBSCRIPTS . ## back to Subscripts again
+ chr(0x31) . chr(0x30) . ## subscript 10
+ ESCAPE . ASCII_DEFAULT; ## back to ASCII
$expected =
'subscript1' . chr(0x2081) .
'subscript9' . chr(0x2089) .
'subscript10' . chr(0x2081) . chr(0x2080);
+ # ucs 'subscript1' . chr(0xE28281) .
+ # ucs 'subscript9' . chr(0xE28289) .
+ # ucs 'subscript10' . chr(0xE28281) . chr(0xE28280);
is( marc8_to_utf8($test), $expected, 'Subscripts' );
@@ -53,22 +56,25 @@
$test =
'superscript1' .
- ESCAPE . SUPERSCRIPTS . ## escape to Superscripts
- chr(0x31) . ## superscript 1
- ESCAPE . ASCII_DEFAULT . ## back to ASCII
+ ESCAPE . SUPERSCRIPTS . ## escape to Superscripts
+ chr(0x31) . ## superscript 1
+ ESCAPE . ASCII_DEFAULT . ## back to ASCII
'superscript9' .
- ESCAPE . SUPERSCRIPTS . ## escape to Superscripts
- chr(0x39) . ## superscript 9
- ESCAPE . ASCII_DEFAULT . ## back to ASCII
+ ESCAPE . SUPERSCRIPTS . ## escape to Superscripts
+ chr(0x39) . ## superscript 9
+ ESCAPE . ASCII_DEFAULT . ## back to ASCII
'superscript10' .
ESCAPE . SUPERSCRIPTS .
- chr(0x31) . chr(0x30) . ## superscript 10
- ESCAPE . ASCII_DEFAULT; ## back to ASCII
+ chr(0x31) . chr(0x30) . ## superscript 10
+ ESCAPE . ASCII_DEFAULT; ## back to ASCII
$expected =
'superscript1' . chr(0x00B9) .
'superscript9' . chr(0x2079) .
'superscript10' . chr(0x00B9) . chr(0x2070);
+ # ucs 'superscript1' . chr(0xC2B9) .
+ # ucs 'superscript9' . chr(0xE281B9) .
+ # ucs 'superscript10' . chr(0xC2B9) . chr(0xE281B0);
is( marc8_to_utf8($test), $expected, 'Superscripts' );
Modified: packages/libmarc-charset-perl/trunk/t/escape2.t
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/t/escape2.t?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/t/escape2.t (original)
+++ packages/libmarc-charset-perl/trunk/t/escape2.t Fri May 18 22:53:45 2007
@@ -12,11 +12,11 @@
## test some ASCII & Greek mixed together
my $test =
- 'this is greek' . ## regular ASCII
+ 'this is greek' . ## regular ASCII
ESCAPE . SINGLE_G0_A . BASIC_GREEK . ## set G0 to Greek
- chr(0x49) . ## zeta
+ chr(0x49) . ## zeta
ESCAPE . SINGLE_G0_A . BASIC_LATIN . ## set GO to ASCII
- 'this is not'; ## regular ASCII
+ 'this is not'; ## regular ASCII
my $expected = 'this is greek' . chr(0x0396) . 'this is not';
is(marc8_to_utf8($test), $expected, 'escape type 2 to Greek');
@@ -26,8 +26,8 @@
$test =
ESCAPE . SINGLE_G0_A . BASIC_ARABIC . ## set G0 to ArabicBasic
ESCAPE . SINGLE_G1_A . EXTENDED_ARABIC. ## set G1 to ArabicExtended
- chr(0x4d) . ## HAH (from Basic)
- chr(0xBA); ## DUL (from Extended)
+ chr(0x4d) . ## HAH (from Basic)
+ chr(0xBA); ## DUL (from Extended)
$expected = chr(0x062D) . chr(0x068E);
is(marc8_to_utf8($test), $expected, 'escape type 2 to Basic+Ext Arabic');
@@ -37,10 +37,10 @@
$test =
ESCAPE . SINGLE_G0_A . BASIC_ARABIC . ## set G0 to ArabicBasic
ESCAPE . SINGLE_G1_A . EXTENDED_ARABIC. ## set G1 to ArabicExtended
- chr(0x47) . ## ALEF (Arabic Basic)
+ chr(0x47) . ## ALEF (Arabic Basic)
ESCAPE . SINGLE_G0_A . BASIC_HEBREW . ## replace ArabicBasic with Hebrew
- chr(0x71) . ## SAMEKH (Hebrew)
- chr(0xE9); ## RNOON (ArabicExtended)
+ chr(0x71) . ## SAMEKH (Hebrew)
+ chr(0xE9); ## RNOON (ArabicExtended)
$expected = chr(0x0627) . chr(0x05E1) . chr(0x06BB);
is(marc8_to_utf8($test), $expected, 'escape type 2 Arabic + Hebrew mixed');
Modified: packages/libmarc-charset-perl/trunk/t/utf8.t
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libmarc-charset-perl/trunk/t/utf8.t?rev=5466&op=diff
==============================================================================
--- packages/libmarc-charset-perl/trunk/t/utf8.t (original)
+++ packages/libmarc-charset-perl/trunk/t/utf8.t Fri May 18 22:53:45 2007
@@ -22,35 +22,35 @@
is(
utf8_to_marc8(chr(0x0628)),
ESCAPE . SINGLE_G0_A . BASIC_ARABIC . chr(0x48) .
- ESCAPE . ASCII_DEFAULT,
+ ESCAPE . ASCII_DEFAULT,
'Basic Arabic'
);
is(
utf8_to_marc8(chr(0x068D)),
ESCAPE . SINGLE_G1_A . EXTENDED_ARABIC . chr(0xB9) .
- ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
+ ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
'Extended Arabic'
);
is(
utf8_to_marc8(chr(0x0440)),
ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x52) .
- ESCAPE . ASCII_DEFAULT,
+ ESCAPE . ASCII_DEFAULT,
'Basic Cyrillic'
);
is(
utf8_to_marc8(chr(0x0408)),
ESCAPE . SINGLE_G1_A . EXTENDED_CYRILLIC . chr(0xE8) .
- ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
+ ESCAPE . SINGLE_G1_A . EXTENDED_LATIN,
'Extended Cyrillic'
);
is(
utf8_to_marc8(chr(0x0398)),
ESCAPE . SINGLE_G0_A . BASIC_GREEK . chr(0x4B) .
- ESCAPE . ASCII_DEFAULT,
+ ESCAPE . ASCII_DEFAULT,
'Greek'
);
@@ -60,7 +60,7 @@
is(
utf8_to_marc8(chr(0x05E0)),
ESCAPE . SINGLE_G0_A . BASIC_HEBREW . chr(0x70) .
- ESCAPE . ASCII_DEFAULT,
+ ESCAPE . ASCII_DEFAULT,
'Hebrew'
);
@@ -77,7 +77,7 @@
is(
utf8_to_marc8(chr(0x71AC)),
ESCAPE . MULTI_G0_A . CJK . chr(0x21) . chr(0x49) . chr(0x7C) .
- ESCAPE . ASCII_DEFAULT,
+ ESCAPE . ASCII_DEFAULT,
'East Asian'
);
@@ -90,7 +90,8 @@
);
is(
- utf8_to_marc8('abc' . chr(0x0327) . chr(0x0300) . chr(0x0301) . 'def'),
+ utf8_to_marc8('abc' . chr(0x0327) . chr(0x0300) . chr(0x0301)
+ . 'def'),
'ab' . chr(0xF0) . chr(0xE1) . chr(0xE2) . 'cdef',
'string with multiple interior combining characters'
);
@@ -101,7 +102,7 @@
is(
utf8_to_marc8(chr(0x043A)),
ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x4B) .
- ESCAPE . ASCII_DEFAULT ,
+ ESCAPE . ASCII_DEFAULT ,
'CYRILLIC SMALL LETTER KA'
);
@@ -109,8 +110,8 @@
is(
utf8_to_marc8(chr(0x05D0) . chr(0x043B)),
ESCAPE . SINGLE_G0_A . BASIC_HEBREW . chr(0x60) .
- ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x4C) .
- ESCAPE . ASCII_DEFAULT,
+ ESCAPE . SINGLE_G0_A . BASIC_CYRILLIC . chr(0x4C) .
+ ESCAPE . ASCII_DEFAULT,
'string with multiple character sets'
);
More information about the Pkg-perl-cvs-commits
mailing list