[Python-apps-team] Bug#644444: pkpgcounter does not properly handle all postscript documents with copies or n-up options

Wed Oct 5 23:05:48 UTC 2011

Package: pkpgcounter
Version: 3.50-7
Severity: normal


As stated, the current pkpgcounter's postscript.py native implementation 
trusts files to be a DSC compliant a little too much.  It doesn't 
properly handle documents printed n-up or certain copies options.  

For example, I had a document that was specified from MS Word for 9 
copies of 2 pages, which produced postscript with 9 duplicates of the 2 
pages, one of which had a 9 copies tag on it.  pkpgcounter marked this 
as 162 pages (9*9*2) rather than 18.

Similarly, n-up page documents count the individual pages rather than 
physical pages.

I have attached a diff for this that I have been testing that simply 
attempts to detect either of those options and then falls back to the 
ghostscript rending method.

I attempted to pass this info to the original dev, but haven't gotten 
any response.

Let me know if you need any more details or the sample postscript.

Thanks,
Brian

-- System Information:
Debian Release: 6.0.2
  APT prefers stable
  APT policy: (500, 'stable'), (120, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages pkpgcounter depends on:
ii  ghostscript             8.71~dfsg2-9     The GPL Ghostscript PostScript/PDF
ii  python                  2.6.6-3+squeeze6 interactive high-level object-orie
ii  python-imaging          1.1.7-2          Python Imaging Library
ii  python-support          1.0.10           automated rebuilding support for P

Versions of packages pkpgcounter recommends:
pn  imagemagick                   <none>     (no description available)
pn  python-psyco                  <none>     (no description available)
pn  texlive-latex-base            <none>     (no description available)
pn  xauth                         <none>     (no description available)
pn  xvfb                          <none>     (no description available)

Versions of packages pkpgcounter suggests:
pn  abiword                       <none>     (no description available)

-- no debconf information

-- debsums errors found:
debsums: changed file /usr/share/pyshared/pkpgpdls/postscript.py (from pkpgcounter package)

*** /filespace/people/b/bpkroth/src/postscript.diff

--- postscript.py.orig	2011-10-04 13:53:32.000000000 -0500
+++ postscript.py	2011-10-05 17:48:30.000000000 -0500
@@ -28,6 +28,8 @@
 import pdlparser
 import inkcoverage
 
+import re
+
 class Parser(pdlparser.PDLParser) :
     """A parser for PostScript documents."""
     totiffcommands = [ 'gs -sDEVICE=tiff24nc -dPARANOIDSAFER -dNOPAUSE -dBATCH -dQUIET -r"%(dpi)i" -sOutputFile="%(outfname)s" "%(infname)s"' ]
@@ -54,7 +56,12 @@
         if self.isMissing(self.required) :
             raise pdlparser.PDLParserError, "The gs interpreter is nowhere to be found in your PATH (%s)" % os.environ.get("PATH", "")
         infname = self.filename
-        command = 'gs -sDEVICE=bbox -dPARANOIDSAFER -dNOPAUSE -dBATCH -dQUIET "%(infname)s" 2>&1 | grep -c "%%HiResBoundingBox:" 2>/dev/null'
+        # Actualy this one reports twice as much with older versions of gs (eg: 8.62.dfsg.1-3.2lenny5)
+        #command = 'gs -sDEVICE=bbox -dPARANOIDSAFER -dNOPAUSE -dBATCH -dQUIET "%(infname)s" 2>&1 | grep -c "%%HiResBoundingBox:" 2>/dev/null'
+        # This seems to be faster and just as accurate.
+        # http://en.wikibooks.org/wiki/PostScript_FAQ#How_to_count_pages_in_a_PS_file.3F
+        # NOTE: As a hack we're hiding the broken pipe messages that yes sometimes spits out.
+        command = 'yes 2>/dev/null | gs -q -dBATCH -dPARANOIDSAFER -sDEVICE=nullpage "%(infname)s" 2>&1 | grep -c showpage 2>/dev/null'
         pagecount = 0
         fromchild = os.popen(command % locals(), "r")
         try :
@@ -66,7 +73,11 @@
             if fromchild.close() is not None :
                 raise pdlparser.PDLParserError, "Problem during analysis of Binary PostScript document"
         self.logdebug("GhostScript said : %s pages" % pagecount)    
-        return pagecount * self.copies
+        # recent versions of ghostscript (at least >= 8.71~dfsg2-9, though
+        # possibly earlier) seem to process copies correctly, even for goofy
+        # windows output that produces both individual pages and a copy tag
+        #return pagecount * self.copies
+        return pagecount
         
     def natively(self) :
         """Count pages in a DSC compliant PostScript document."""
@@ -78,6 +89,7 @@
         prescribe = False # Kyocera's Prescribe commands
         acrobatmarker = False
         pagescomment = None
+        scalepattern = re.compile("[0-9].*\s+scale([^f]|$)");
         for line in self.infile :
             line = line.strip()
             if (not prescribe) and line.startswith(r"%%BeginResource: procset pdf") \
@@ -167,11 +179,21 @@
                 else :    
                     if number > self.pages[pagecount]["copies"] :
                         self.pages[pagecount]["copies"] = number
+            # The scale operator is often used in n-up style printing, which
+            # this native method doesn't handle.  Set notrust and make
+            # ghostscript do it for us.
+            # NOTE: This might catch non-n-up printing, but just means we'll
+            # take a small performance hit by calling out to ghostscript.
+            elif scalepattern.search(line) :
+                notrust = True
             previousline = line
             
         # extract max number of copies to please the ghostscript parser, just    
         # in case we will use it later
         self.copies = max([ v["copies"] for (k, v) in self.pages.items() ])
+        # See notes above regarding ghostscript and copies.
+        if self.copies > 1 :
+            notrust = True
         
         # now apply the number of copies to each page
         if not pagecount and pagescomment :    
@@ -189,10 +211,13 @@
         """Count pages in PostScript document."""
         self.copies = 1
         (nbpages, notrust) = self.natively()
+        #print "nbpages: %d, notrust: %d" % (nbpages, notrust)
         newnbpages = nbpages
         if notrust or not nbpages :
             try :
                 newnbpages = self.throughGhostScript()
             except pdlparser.PDLParserError, msg :
                 self.logdebug(msg)
-        return max(nbpages, newnbpages)    
+        # max() is probably the wrong thing to do
+        #return max(nbpages, newnbpages)    
+        return newnbpages