Bug#325090: libparse-debian-packages-perl: Incorrect parsing of Packages, adding parsing of Sources plus constructor imrovement

Daniel 'NebuchadnezzaR' Dehennin nebuchadnezzar at asgardr.info
Fri Aug 26 02:55:14 UTC 2005


Package: libparse-debian-packages-perl
Version: 0.01-1
Severity: normal

Hello,

Using this package to parse my mirror Packages and Sources I found
some problems:

- packages with a Homepage: at the end of the body make the next
  package have a ' Homepage' key,

- more generaly: a packages with a "word:" at the begining of a line
  in long description cause that package to have an extra key,

I add some code to make it possible to pass a string to the
contructor, this one is in chage to figure what type of file it is
(plain/text, gzip or bzip2) and open it correctly.

I had the possibility to parse a Sources file (maybe the lib should
change its name ?) adding a Files key:

$source{Files}{$filename} = { size =>
                              MD5sum =>
			    }

Now a parser have a __readline "hidden" methode, so "next" methode can
get a line of a package without figuring how to do it (getline fo
FileHandle, gzreadline or bzreadline).

This version is full compatible with version 0.01.

If a caller pass a filename, new() can return undef if the file does
not exist of is not in supported format or can not be open.
Note that this version depend on:

use Compress::Zlib;
use Compress::Bzip2;
use File::MMagic;
use FileHandle;

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (990, 'unstable')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/dash
Kernel: Linux 2.6.12+thorr.2
Locale: LANG=fr_FR at euro, LC_CTYPE=fr_FR at euro (charmap=ISO-8859-15)

Versions of packages libparse-debian-packages-perl depends on:
ii  libyaml-perl                  0.38-2     YAML Ain't Markup Language (tm)
ii  perl                          5.8.7-4    Larry Wall's Practical Extraction 

libparse-debian-packages-perl recommends no packages.

-- no debconf information
-------------- next part --------------
--- Packages.pm.old	2005-08-26 02:58:55.000000000 +0200
+++ Packages.pm	2005-08-26 04:53:37.000000000 +0200
@@ -1,25 +1,67 @@
 use strict;
 package Parse::Debian::Packages;
-our $VERSION = '0.01';
+our $VERSION = "0.02";
+
+use Compress::Zlib;
+use Compress::Bzip2;
+use File::MMagic;
+use FileHandle;
 
 sub new {
     my $class = shift;
-    my $fh = shift;
+    my $file = shift;
+    my $fh;
 
-    return bless { fh => $fh }, $class;
+    if (! ref $file) {
+	# Caller give us a filename
+	return undef unless -f $file;
+
+	# Default magic is ok for application/x-gzip application/x-bzip2 and text/plain
+	my $magic = File::MMagic->new();
+	my $type = $magic->checktype_filename($file);
+
+      SWITCH: for ($type) {
+	  /text\/plain/ && do {
+	      $fh = new FileHandle;
+	      if (! $fh->open("< $file")) {
+		  return undef;
+	      }
+	      last;
+	  };
+	  
+	  /application\/x-gzip/ && do {
+	      $fh = gzopen ($file, "rb")
+		  return undef;
+	      last;
+	  };
+	  
+	  /application\/x-bzip2/ && do {
+	      $fh = bzopen ($file, "rb")
+		  return undef;
+	      last;
+	  };
+	  # It's not a supported file format
+	  return undef;
+      }
+    	return bless { FH => $fh, TYPE => $type}, $class;
+    } else {
+    	return bless { FH => $file, TYPE => "IOFile"}, $class;
+    }
 }
 
 sub next {
     my $self = shift;
-    my $fh   = $self->{fh};
 
     my %parsed;
-    while (<$fh>) {
+    while ($_ = $self->__readline) {
         last if /^$/;
-        if (my ($key, $value) = m/^(.*): (.*)/) {
-            $parsed{$key} = $value;
-        }
-        else {
+
+        if (my ($key, $value) = m/^([^\s:]*):\s?(.*)/) {
+	    # Do not add an empty Files key when parsing Sources
+            $parsed{$key} = $value unless $key eq "Files";
+        } elsif (my ($md5, $size, $filename) = /^\s(\w{32})\s(\d+)\s(.*)/) {
+	    $parsed{Files} = { $filename => { size => $size, MD5sum => $md5 } };
+	} else {
             s/ //;
             s/^\.$//;
             $parsed{body} .= $_;
@@ -29,7 +71,37 @@
     return %parsed;
 }
 
-1;
+sub __readline {
+    my $self = shift;
+    my $line = "";
+
+      SWITCH: for ($self->{TYPE}) {
+	  /text\/plain|IOFile/ && do {
+	      $line = $self->{FH}->getline;
+	      last;
+	  };
+	  
+	  /application\/x-gzip/ && do {
+	      my $bytesread = $self->{FH}->gzreadline($line);
+	      if ($bytesread == 0) {
+		  $line = "";
+	      }
+	      last;
+	  };
+	  
+	  /application\/x-bzip2/ && do {
+	      my $bytesread = $self->{FH}->bzreadline($line);
+	      if ($bytesread == 0) {
+		  $line = "";
+	      }
+	      last;
+	  };
+	  die "Should Never Happend\n";
+      }
+    return $line;
+}
+
+1
 
 
 =head1 NAME
@@ -40,24 +112,48 @@
 
  use YAML;
  use IO::File;
+ use FileHandle;
  use Parse::Debian::Packages;
- my $fh = IO::File->new("Packages");
 
- my $parser = Parse::Debian::Packages->new( $fh );
- while (my %package = $parser->next) {
+ my $pkg_file = "Packages";
+ my $src_file = "Sources";
+ my $other_src_file = "Sources.bz2";
+
+ my $fh_io = IO::File->new($pkg_file);
+ my $fh_FH = new FileHandle;
+ $fh_FH->open("< $src_file");
+
+ my $parser_on_io = Parse::Debian::Packages->new( $fh_io );
+ my $parser_on_FH = Parse::Debian::Packages->new( $fh_FH );
+ my $parser_on_filename = Parse::Debian::Packages->new( $other_src_file );
+
+ my %pkg_with_io = $parser_on_io->next;
+ my %pkg_with_FH = $parser_on_FH->next;
+ my %pkg_with_filename = $parser_on_filename->next;
+
+ print Dump \%pkg_with_io;
+ print Dump \%pkg_with_FH;
+ print Dump \%pkg_with_filename;
+
+ while (my %package = $parser_on_io->next) {
      print Dump \%package;
  }
 
 =head1 DESCRIPTION
 
-This module parses the Packages files used by the debian package
-management tools.
+This module parses the Packages and Sources files used by the debian
+package management tools.
 
 It presents itself as an iterator.  Each call of the ->next method
 will return the next package found in the file.
 
-For laziness, we take a filehandle in to the constructor.  Please open
-the file for us.
+You can pass a FileHandle to the constructor of a filename, the
+advantage of the filename is that you can parse plain/text, gziped or
+bziped files.
+
+If the filename passed to the constructor don't repressent a file in
+supported format (text/plain, application/x-gzip,
+application/x-bzip2) or if that file can not be open, new() return undef.
 
 =head1 AUTHOR
 


More information about the pkg-perl-maintainers mailing list