[Pkg-puppet-devel] "since prinstine-tar is known to be broken" [was: Re: packaging puppet and clojure ITPs]

Russ Allbery rra at debian.org
Sun Jul 5 20:45:41 BST 2020


Gabriel Filion <gabriel at koumbit.org> writes:

> I personally am not certain to understand the end-goal of pristine-tar
> and how it actually achieves it. I've read wiki pages about it and I
> didn't feel like after reading the description of the original
> problematic that lead up to pristine-tar in the first place I was
> actually able to grasp the nature of the problematic.

The original goal of pristine-tar was to be able to do all packaging tasks
from a single Git repository without having to check in a (possibly large)
binary tar file to get the upstream tarball required for a Debian source
package.  pristine-tar attempts to create a byte-for-byte identical copy
of the upstream tarball that was committed using that tool by drawing on
the files that were checked into the Git repository (on an upstream
branch, generally) and then using tar and gzip with special flags in
conjunction with a binary delta.

The alternative back in the day was to keep a directory of tarballs around
(which some folks checked into Subversion or Git, but that was irritating
to manage).  Now there are good, automated tools for downloading the
previous upstream tarball directly from the archive, so pristine-tar is
somewhat less interesting than it used to be, although it can still be
somewhat useful for off-line packaging.

Also, the whole point of pristine-tar is to reproduce a release artifact
that upstream has blessed with some significance, such as by putting it on
the official download page.  The historical idea was to base Debian
packaging on the official upstream release artifact so that one can trace
provenance back to the upstream release and its cryptographic signatures
or checksums and be sure that no one has introduced some changes into the
Debian packaging that aren't explicitly represented.  But it's now
increasingly common for upstreams to use a Git tag as their official
release marker and not care much about the tarball artifacts.  This
undermines much of the motivation for pristine-tar.  If you're going to
generate a tarball from a Git tag, you're not reproducing an upstream
artifact that anyone cares about, and the only thing that matters is the
basic mechanics of being able to get ahold of that tarball again when you
want to create a new Debian package.  For that, we now have other tools,
such as origtargz (which can also handle pristine-tar if you want).

> ... with that context in place, I'm wondering how pristine-tar is broken
> and if you would be so kind as to summarize an example situation around
> this that you experienced first-hand.

The thing that pristine-tar tries to do is extremely difficult, since it
has to reproduce whatever weirdness upstream did when creating its
tarball.  Sometimes that was done on a non-Linux host with a non-GNU tar,
sometimes it was done with ancient software or odd flags, and, more
commonly these days, sometimes it was compressed with a compression
program whose output changes somewhat from version to version.

As a result, many folks have discovered that while pristine-tar normally
works fine during continuous unstable development, if you go back and try
to check out a tarball with pristine-tar that was created several years
ago, or try to recreate a tarball on a Debian stable or oldstable system
that was checked in on a Debian unstable system, pristine-tar will fail.

This seems to affect some people more than others.  xz appears to be more
likely to cause problems than gzip.  Some people never have problems;
other people run into problems all the time.

-- 
Russ Allbery (rra at debian.org)              <https://www.eyrie.org/~eagle/>



More information about the Pkg-puppet-devel mailing list