r5561 - in /packages/libtext-csv-perl/trunk: CSV_XS.pm CSV_XS.xs ChangeLog MANIFEST META.yml README debian/changelog debian/control t/45_eol.t
eloy at users.alioth.debian.org
eloy at users.alioth.debian.org
Fri Jun 1 14:48:56 UTC 2007
Author: eloy
Date: Fri Jun 1 14:48:56 2007
New Revision: 5561
URL: http://svn.debian.org/wsvn/pkg-perl/?sc=1&rev=5561
Log:
new upstream version
Added:
packages/libtext-csv-perl/trunk/t/45_eol.t
- copied unchanged from r5560, packages/libtext-csv-perl/branches/upstream/current/t/45_eol.t
Modified:
packages/libtext-csv-perl/trunk/CSV_XS.pm
packages/libtext-csv-perl/trunk/CSV_XS.xs
packages/libtext-csv-perl/trunk/ChangeLog
packages/libtext-csv-perl/trunk/MANIFEST
packages/libtext-csv-perl/trunk/META.yml
packages/libtext-csv-perl/trunk/README
packages/libtext-csv-perl/trunk/debian/changelog
packages/libtext-csv-perl/trunk/debian/control
Modified: packages/libtext-csv-perl/trunk/CSV_XS.pm
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/CSV_XS.pm?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/CSV_XS.pm (original)
+++ packages/libtext-csv-perl/trunk/CSV_XS.pm Fri Jun 1 14:48:56 2007
@@ -28,7 +28,7 @@
use DynaLoader ();
use vars qw( $VERSION @ISA );
-$VERSION = "0.26";
+$VERSION = "0.27";
@ISA = qw( DynaLoader );
sub PV () { 0 }
@@ -191,7 +191,7 @@
{
my ($self, $idx, $val) = @_;
ref $self->{_FFLAGS} &&
- $idx >= 0 && $idx < @{$self->{_FFLAGS}} or return undef;
+ $idx >= 0 && $idx < @{$self->{_FFLAGS}} or return;
$self->{_FFLAGS}[$idx] & 0x0001 ? 1 : 0;
} # is_quoted
@@ -199,7 +199,7 @@
{
my ($self, $idx, $val) = @_;
ref $self->{_FFLAGS} &&
- $idx >= 0 && $idx < @{$self->{_FFLAGS}} or return undef;
+ $idx >= 0 && $idx < @{$self->{_FFLAGS}} or return;
$self->{_FFLAGS}[$idx] & 0x0002 ? 1 : 0;
} # is_binary
@@ -317,6 +317,43 @@
comma-separated values. An instance of the Text::CSV_XS class can combine
fields into a CSV string and parse a CSV string into fields.
+The module accepts either strings or files as input and can utilize any
+user-specified characters as delimiters, separators, and escapes so it is
+perhaps better called ASV (anything separated values) rather than just CSV.
+
+=head2 Embedded newlines
+
+B<Important Note>: The default behaviour is to only accept ascii characters.
+This means that fields can not contain newlines. If your data contains
+newlines embedded in fields, or characters above 0x7e (tilde), or binary data,
+you *must* set C<binary => 1> in the call to C<new ()>. To cover the widest
+range of parsing options, you will always want to set binary.
+
+But you still have the problem that you have to pass a correct line to the
+C<parse ()> method, which is more complicated from the usual point of
+usage:
+
+ my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
+ while (<>) {
+ $csv->parse ($_);
+ my @fields = $csv->fields ();
+
+will break, as the while might read broken lines, as that doesn't care
+about the quoting. If you need to support embedd newlines, the way to go
+is either
+
+ use IO::Handle;
+ my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
+ while (my $row = $csv->getline (*ARGV)) {
+ my @fields = @$row;
+
+or, more safely in perl 5.6 and up
+
+ my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
+ open my $io, "<", $file or die "$file: $!";
+ while (my $row = $csv->getline ($io)) {
+ my @fields = @$row;
+
=head1 FUNCTIONS
=over 4
@@ -347,10 +384,25 @@
default), C<"\012"> (Line Feed) or C<"\015\012"> (Carriage Return,
Line Feed)
+If both C<$/> and C<eol> equal C<"\015">, parsing lines that end on
+only a Carriage Return without Line Feed, will be C<parse>d correct.
+Line endings, wheather in C<$/> or C<eol>, other than C<undef>,
+C<"\n">, C<"\r\n">, or C<"\r"> are not (yet) supported for parsing.
+
=item escape_char
-The char used for escaping certain characters inside quoted fields,
-by default the same character. (C<">)
+The character used for escaping certain characters inside quoted fields.
+
+The C<escape_char> defaults to being the literal double-quote mark (C<">)
+in other words, the same as the default C<quote_char>. This means that
+doubling the quote mark in a field escapes it:
+
+ "foo","bar","Escape ""quote mark"" with two ""quote marks""","baz"
+
+If you change the default quote_char without changing the default
+escape_char, the escape_char will still be the quote mark. If instead
+you want to escape the quote_char by doubling it, you will need to change
+the escape_char to be the same as what you changed the quote_char to.
The escape character can not be equal to the separation character.
@@ -440,7 +492,7 @@
to the I<$io> object, typically an IO handle or any other object that
offers a I<print> method. Note, this implies that the following is wrong:
- open FILE, ">whatever";
+ open FILE, ">", "whatever";
$status = $csv->print (\*FILE, $colref);
The glob C<\*FILE> is not an object, thus it doesn't have a print
@@ -693,11 +745,17 @@
=head1 TODO
+=over 2
+
+=item eol
+
Discuss an option to make the eol honor the $/ setting. Maybe
my $csv = Text::CSV_XS->new ({ eol => $/ });
is already enough, and new options only make things less opaque.
+
+=item setting meta info
Future extensions might include extending the C<fields_flags ()>,
C<is_quoted ()>, and C<is_binary ()> to accept setting these flags
@@ -707,6 +765,8 @@
$csv->meta_info (0, 1, 1, 3, 0, 0);
$csv->is_quoted (3, 1);
+=item parse returning undefined fields
+
Adding an option that enables the parser to distinguish between
empty fields and undefined fields, like
@@ -718,6 +778,55 @@
Then would return (undef, "", "1", "2", undef, "") in @fld, instead
of the current ("", "", "1", "2", "", "").
+=item combined methods
+
+Adding means (methods) that combine C<combine ()> and C<string ()> in
+a single call. Likewise for C<parse ()> and C<fields ()>. Given the
+trouble with embedded newlines, maybe just allowing C<getline ()> and
+C<print ()> is sufficient.
+
+=item Unicode
+
+Make C<parse ()> and C<combine ()> do the right thing for Unicode
+(UTF-8) if requested. See t/50_utf8.t. More complicated, but evenly
+important, also for C<getline ()> and C<print ()>.
+
+=item Space delimited seperators
+
+Discuss if and how C<Text::CSV_XS> should/could support formats like
+
+ 1 , "foo" , "bar" , 3.19 ,
+
+=item Double double quotes
+
+There seem to be applications around that write their dates like
+
+ 1,4,""12/11/2004"",4,1
+
+If we would support that, in what way?
+
+=item Parse the whole file at once
+
+Implement a new methods that enables the parsing of a complete file
+at once, returning a lis of hashes. Possible extension to this could
+be to enable a column selection on the call:
+
+ my @AoH = $csv->parse_file ($filename, { cols => [ 1, 4..8, 12 ]});
+
+Returning something like
+
+ [ { fields => [ 1, 2, "foo", 4.5, undef, "", 8 ],
+ flags => [ ... ],
+ errors => [ ... ],
+ },
+ { fields => [ ... ],
+ .
+ .
+ },
+ ]
+
+=back
+
=head1 SEE ALSO
L<perl(1)>, L<IO::File(3)>, L<IO::Wrap(3)>, L<Spreadsheet::Read(3)>
@@ -732,10 +841,13 @@
Jochen Wiedmann F<E<lt>joe at ispsoft.deE<gt>> rewrote the encoding and
decoding in C by implementing a simple finite-state machine and added
the variable quote, escape and separator characters, the binary mode
-and the print and getline methods.
-
-H.Merijn Brand F<E<lt>h.m.brand at xs4all.nlE<gt>> cleaned up the code
-and added the field flags methods.
+and the print and getline methods. See ChangeLog releases 0.10 through
+0.23.
+
+H.Merijn Brand F<E<lt>h.m.brand at xs4all.nlE<gt>> cleaned up the code,
+added the field flags methods, wrote the major part of the test suite,
+completed the documentation, fixed some RT bugs. See ChangeLog releases
+0.25 and on.
=head1 COPYRIGHT AND LICENSE
Modified: packages/libtext-csv-perl/trunk/CSV_XS.xs
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/CSV_XS.xs?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/CSV_XS.xs (original)
+++ packages/libtext-csv-perl/trunk/CSV_XS.xs Fri Jun 1 14:48:56 2007
@@ -9,9 +9,11 @@
#include <XSUB.h>
#include "ppport.h"
-#define CSV_XS_TYPE_PV 0
-#define CSV_XS_TYPE_IV 1
-#define CSV_XS_TYPE_NV 2
+#define MAINT_DEBUG 0
+
+#define CSV_XS_TYPE_PV 0
+#define CSV_XS_TYPE_IV 1
+#define CSV_XS_TYPE_NV 2
#define CSV_FLAGS_QUO 0x0001
#define CSV_FLAGS_BIN 0x0002
@@ -41,6 +43,9 @@
SV *tmp;
char *types;
STRLEN types_len;
+ char *eol;
+ STRLEN eol_len;
+ int eol_is_cr;
} csv_t;
#define bool_opt(o) \
@@ -81,6 +86,15 @@
STRLEN len;
csv->types = SvPV (*svp, len);
csv->types_len = len;
+ }
+ csv->eol = NULL;
+ csv->eol_is_cr = 0;
+ if ((svp = hv_fetch (self, "eol", 3, 0)) && *svp && SvOK (*svp)) {
+ STRLEN len;
+ csv->eol = SvPV (*svp, len);
+ csv->eol_len = len;
+ if (len == 1 && *csv->eol == '\015')
+ csv->eol_is_cr = 1;
}
csv->binary = bool_opt ("binary");
@@ -206,9 +220,12 @@
return TRUE;
} /* Combine */
-static void ParseError (csv_t *csv)
+static void ParseError (csv_t *csv, int ln)
{
if (csv->tmp) {
+#if MAINT_DEBUG
+ fprintf (stderr, "# Parse error on line %d: '%s'\n", ln, csv->tmp);
+#endif
if (hv_store (csv->self, "_ERROR_INPUT", 12, csv->tmp, 0))
SvREFCNT_inc (csv->tmp);
}
@@ -242,12 +259,12 @@
#define ERROR_INSIDE_QUOTES { \
SvREFCNT_dec (insideQuotes); \
- ParseError (csv); \
+ ParseError (csv, __LINE__); \
return FALSE; \
}
#define ERROR_INSIDE_FIELD { \
SvREFCNT_dec (insideField); \
- ParseError (csv); \
+ ParseError (csv, __LINE__); \
return FALSE; \
}
@@ -306,7 +323,7 @@
}
}
else
- if (c == '\012') {
+ if (c == '\012') { /* \n */
if (waitingForField) {
av_push (fields, newSVpv ("", 0));
if (csv->flags)
@@ -327,9 +344,16 @@
}
}
else
- if (c == '\015') {
+ if (c == '\015') { /* \r */
if (waitingForField) {
- int c2 = CSV_GET;
+ int c2;
+
+ if (csv->eol_is_cr) {
+ c = '\012';
+ goto restart;
+ }
+
+ c2 = CSV_GET;
if (c2 == EOF) {
insideField = newSVpv ("", 0);
@@ -356,7 +380,14 @@
CSV_PUT_SV (insideQuotes, c);
}
else {
- int c2 = CSV_GET;
+ int c2;
+
+ if (csv->eol_is_cr) {
+ AV_PUSH (insideField);
+ return TRUE;
+ }
+
+ c2 = CSV_GET;
if (c2 == '\012') {
AV_PUSH (insideField);
@@ -390,19 +421,23 @@
return TRUE;
if (c2 == '\015') {
- int c3 = CSV_GET;
-
+ int c3;
+
+ if (csv->eol_is_cr)
+ return TRUE;
+
+ c3 = CSV_GET;
if (c3 == '\012')
return TRUE;
- ParseError (csv);
+ ParseError (csv, __LINE__);
return FALSE;
}
if (c2 == '\012')
return TRUE;
- ParseError (csv);
+ ParseError (csv, __LINE__);
return FALSE;
}
@@ -431,7 +466,14 @@
else {
if (c2 == '\015') {
- int c3 = CSV_GET;
+ int c3;
+
+ if (csv->eol_is_cr) {
+ AV_PUSH (insideQuotes);
+ return TRUE;
+ }
+
+ c3 = CSV_GET;
if (c3 == '\012') {
AV_PUSH (insideQuotes);
Modified: packages/libtext-csv-perl/trunk/ChangeLog
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/ChangeLog?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/ChangeLog (original)
+++ packages/libtext-csv-perl/trunk/ChangeLog Fri Jun 1 14:48:56 2007
@@ -1,3 +1,17 @@
+2007-05-31 0.27 - H.Merijn Brand <h.m.brand at xs4all.nl>
+
+ * checked with perlcritic (still works under 5.00504)
+ so 3-arg open cannot be used (except in the docs)
+ * 3-arg open in docs too
+ * Added a lot to the TODO list
+ * Some more info on using escape character (jZed)
+ * Mention Text::CSV_PP in README
+ * Added t/45_eol.t, eol tests
+ * Added a section about embedded newlines in the pod
+ * Allow \r as eol ($/) for parsing
+ * More docs for eol
+ * More eol = \r fixes, tfrayner's test case added to t/45_eol.t
+
2007-05-15 0.26 - H.Merijn Brand <h.m.brand at xs4all.nl>
* Add $csv->allow_undef (1) suggestion in TODO
Modified: packages/libtext-csv-perl/trunk/MANIFEST
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/MANIFEST?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/MANIFEST (original)
+++ packages/libtext-csv-perl/trunk/MANIFEST Fri Jun 1 14:48:56 2007
@@ -13,6 +13,7 @@
t/20_file.t IO tests (print and getline)
t/30_types.t Tests for the "types" attribute.
t/40_misc.t Binary mode tests
+t/45_eol.t Embedded EOL
t/50_utf8.t Unicode stress tests
t/55_combi.t Different CSV character combinations
t/60_samples.t Miscellaneous problems from the modules history.
Modified: packages/libtext-csv-perl/trunk/META.yml
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/META.yml?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/META.yml (original)
+++ packages/libtext-csv-perl/trunk/META.yml Fri Jun 1 14:48:56 2007
@@ -1,6 +1,6 @@
--- #YAML:1.0
name: Text-CSV_XS
-version: 0.26
+version: 0.27
abstract: Comma-Separated Values manipulation routines
license: perl
generated_by: ExtUtils::MakeMaker version 6.32
Modified: packages/libtext-csv-perl/trunk/README
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/README?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/README (original)
+++ packages/libtext-csv-perl/trunk/README Fri Jun 1 14:48:56 2007
@@ -28,3 +28,6 @@
Jochen Wiedmann <joe at ispsoft.de>
Interface design by Alan Citterman <alan at mfgrtl.com>
+
+ A pure-perl version is being maintained by Makamaka Hannyaharamitu
+ as Text::CSV_PP, which tries to follow Text::CSV_XS very closely.
Modified: packages/libtext-csv-perl/trunk/debian/changelog
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/debian/changelog?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/debian/changelog (original)
+++ packages/libtext-csv-perl/trunk/debian/changelog Fri Jun 1 14:48:56 2007
@@ -1,8 +1,9 @@
-libtext-csv-perl (0.26-2) UNRELEASED; urgency=low
+libtext-csv-perl (0.27-1) unstable; urgency=low
- * NOT RELEASED YET
+ * New upstream release
+ * debian/control: added me to Uploaders
- -- Damyan Ivanov <dmn at debian.org> Tue, 22 May 2007 12:20:32 +0300
+ -- Krzysztof Krzyzaniak (eloy) <eloy at debian.org> Fri, 01 Jun 2007 16:47:47 +0200
libtext-csv-perl (0.26-1) unstable; urgency=low
Modified: packages/libtext-csv-perl/trunk/debian/control
URL: http://svn.debian.org/wsvn/pkg-perl/packages/libtext-csv-perl/trunk/debian/control?rev=5561&op=diff
==============================================================================
--- packages/libtext-csv-perl/trunk/debian/control (original)
+++ packages/libtext-csv-perl/trunk/debian/control Fri Jun 1 14:48:56 2007
@@ -1,6 +1,6 @@
Source: libtext-csv-perl
Maintainer: Debian Perl Group <pkg-perl-maintainers at lists.alioth.debian.org>
-Uploaders: Gunnar Wolf <gwolf at debian.org>, Niko Tyni <ntyni at iki.fi>, gregor herrmann <gregor+debian at comodo.priv.at>
+Uploaders: Gunnar Wolf <gwolf at debian.org>, Niko Tyni <ntyni at iki.fi>, gregor herrmann <gregor+debian at comodo.priv.at>, Krzysztof Krzyzaniak (eloy) <eloy at debian.org>
Section: perl
Priority: optional
Standards-Version: 3.7.2
More information about the Pkg-perl-cvs-commits
mailing list