Workflow for upstream->downstream integration

Roger Leigh rleigh at codelibre.net
Sun Aug 23 13:34:45 UTC 2009


Hi folks,

As requested by the subscription email, a little introduction for
myself: I'm a Debian developer and general free software hacker.
Over the years, I have been involved as both a primary upstream
developer of free software as well as downstream integrator
(primarily in my role as a Debian developer for the latter).
Over this time, I have also used a number of VCSes starting with
CVS/RCS and then moving up to GNU Arch and Subversion.  Nowadays
I use git for almost everything.  I'm interested in use of DVCSes
for making is easier for developers to collaborate, but within
teams and between upstream and downstream developers and users.

One thing still lacking is easy collaboration between upstream
and downstream, whether these be the primary developer and
packager/distributor (e.g. GNU and Debian) or primary
packager/distributor and derived custom distribution
(e.g. Debian and Ubuntu).  I'd like to improve this, and the
example code for discussion below and description outline
what I feel is lacking, and a possible approach for addressing it.


Some of the content of this mail is taken from a thread previously
brought up on the GNU Automake and git mailing lists.

http://thread.gmane.org/gmane.comp.sysutils.automake.general/10936
http://thread.gmane.org/gmane.comp.version-control.git/126007

I won't duplicate all the minor details in this post.
However, the issues detailed above are applicable to build systems
other than automake, and version control systems other than git.
As a result, it was suggested this list would be a good place to
discuss this, and identify the most appropriate place(s) to integrate
it.


Currently upstreams manage releases using a combination of branches
and/or tags to identify a release.  This release is then distributed
as a tar/zip file for people to use.  Distributors use this tarball
as the base for adding their distribution-specific packaging.
Nowadays many of us use a VCS to manage this.  However, most of the
tools do this by importing the upstream release tarball onto a
branch.

This is the bit that causes problems.  We have gone:
  VCS → tarball → VCS
but, the intermediate tarball step was totally pointless when with a
distributed VCS we could just have gone
  VCS upstream release branch → VCS remote upstream release branch
i.e. in git a simple fetch/pull of a remote branch.  This skips a
pointless extra step, and additionally ensures there's a contiguous
history between all upstream and downstream changes, which makes it
much easier to send patches between both parties, easing ongoing
maintenance.

So, to move to specifics.  From an upstream POV:

I've long been using automake, and I've always used "make dist[check]"
as the end part of the release process.  When using a version control
system, be it CVS, SVN or git, the following sort of workflow happens:

• tag repo
• make new checkout to get a pristine tree
• bootstrap autotools
• configure and make and make check
• make distcheck
• distribute generated release tarball

However, the "distribute release tarball" step is becoming less and
less relevant with the advent of git.

I do a lot of Debian packaging work, as well as actual software
development for Debian.  All of this work nowadays occurs in git
repositories.  For packages of software distributed by third
parties (most packages) our workflow is to have a git repository with
a minimum of two branches:

• an upstream branch - contains the contents of the release tarball
  injected into git
• a debian branch - contains the Debian packaging

There may be other branches (tracking upstream stable/development and
different Debian releases), but these are the important ones for
the context of this discussion.  We track upstream releases on one
branch, and do the packaging work on a separate branch.  [there's also
an optional pristine-tar branch for reconstructing perfect copies of
the release tarball from git, but that's not particularly relevant
here.]

The important point to make here is that the tarball was irrelevant.
It was just an intermediate transfer format between two version
control systems (possibly even the /same/ version control system, if
not in extreme cases the exact same repository).  In an ideal world,
the upstream releases could be tracked directly within the version
control system, without any tarballs at all.

Nowadays, it's becoming increasingly common that the upstream
developers are also using git (often it's ourselves!), and it would
greatly ease collaboration and keeping up to date, as well as
simplifying our workflow, if we could skip the tarball part and just
pull the release out of the repo.

What are we missing?

The VCS does not contain exactly the same files as the release
tarball.  It does not typically contain the result of the autotools
bootstrap, and it may contain other generated files, as well as
possibly not distributing some files only needed for work in the VCS.

We need this distributed tarball content in the VCS.  Currently we
are re-importing the tarball by hand as a downstream user, but
"make dist" could do this at the upstream end.  The upstream
developer can now have two branches in their VCS:

• a working/release branch as always
• a distribution branch containing the distributed files

The downstream user can now simply track the distribution branch
with tarball distribution and re-injection being completely
skipped.

What do we gain?

At this point, we can track all changes in git:

• tag repo
• [optionally clone clean repo to be sure you get a non-crufty
  environment]
• make distcheck [enter GPG key to sign the distributed release]

  upstream release branch [+tag]
  → upstream distribution branch [+tag]
    → downstream packaging branch [+tag]
      [ → user modifications ]
      [ → further downstream distributor changes ]

Everything can potentially be in git, and patches can be sent between
all parties with ease.

What can automake do?

• Add a dist-git option and Makefile target.
  This will cause $distdir to be injected into git, rather than just
  calling tar as for other git targets.

• This will require some additional options in order to work correctly:
  · A branch name (the head to append the new tree to)
  · [optional] Tag name, could be a pattern such as dist/$(VERSION)
  · [optional] Flag for signing the tag or not
  . [optional] Template commit message
  These could all just be variables in the top-level Makefile.am.

Normal git usage takes place in the working tree.  Because we're
already in a working tree, we make use of a neat git feature:
using GIT_INDEX_FILE to operate with an alternative index.  This
way, we can add the entire contents of $distdir to the alternate
index, and then commit that onto a separate branch using the
low-level git core plumbing git write-tree and git commit-tree,
followed by git update-ref.

% make distdir
% GIT_INDEX_FILE=idx GIT_WORK_TREE=$distdir git write-tree
⇒ 4b825dc642cb6eb9a060e54bf8d69288fbee4904
%echo "Distribution of $(PACKAGE) version $(VERSION)" [|gpg --clearsign] | git commit-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
⇒ 04bd8ea48e17d845d22cfcfb10bde1a4dee06caf
% git update-ref refs/heads/distribution 04bd8ea48e17d845d22cfcfb10bde1a4dee06caf
% git tag -s ...

A full working implementation is provided at the end of this mail.
This could possibly be provided as a separate script, rather than
in the generated Makefile, but it does hook into automake-specific
targets.  The distributed release is put on a distribution branch,
and its parents are both the previous release and the current head.
i.e. it's a merge of the old distributed release and the current
release.  This lets you do easy merging of changes between both
branches, with a correct history.

Debian has now got a new source package format, which includes
(currently experimental) support for git as a source distribution
format.  With the above missing piece in place, it will be possible to
have all the sources in git from upstream to distributor to end user,
with an complete GPG signed history of all changes made by upstream
and the distributor (and any other distributors downstream of Debian).
Of course, the feature as proposed has many other uses than this, but
that's where I'd like to get to!


Generalising:
The script does three things:
• tag release
• import/merge distribution release onto distribution branch
• tag distribution
None of these are git-specific.  These steps could be potentially
generalised for other VCSes.


Regards,
Roger


This drops into any automake Makefile.am:
Run with make ENABLE_DIST_GIT=true dist-git
The ENABLE_DIST_GIT is a safety measure to avoid inadvertent repo
modifications.

GIT_RELEASE_BRANCH=HEAD
GIT_RELEASE_TAG=true
GIT_RELEASE_TAG_SIGN=true
GIT_RELEASE_TAG_NAME=release/$(PACKAGE)-$(VERSION)
GIT_RELEASE_TAG_MESSAGE="Release of $(PACKAGE)-$(VERSION)"

GIT_DIST_BRANCH=distribution
GIT_DIST_COMMIT_MESSAGE="Distribution of $(PACKAGE) version $(VERSION)"
GIT_DIST_TAG=true
GIT_DIST_TAG_SIGN=true
GIT_DIST_TAG_NAME=distribution/$(PACKAGE)-$(VERSION)
GIT_DIST_TAG_MESSAGE="Distribution of $(PACKAGE)-$(VERSION)"

dist-git: distdir
	if [ "$(ENABLE_DIST_GIT)" != "true" ]; then \
	    echo "$@: ENABLE_DIST_GIT not true; not distributing"; \
	  exit 0; \
	fi; \
	cd "$(abs_top_srcdir)"; \
	if [ ! -d .git ]; then \
	    echo "$@: Not a git repository" 1>&2; \
	    exit 1; \
	fi; \
	if [ "$(GIT_RELEASE_TAG)" = "true" ]; then \
          if git show-ref --tags -q $(GIT_RELEASE_TAG_NAME); then \
	    echo "git release tag $(GIT_RELEASE_TAG_NAME) already exists; not distributing" 1>&2; \
	    exit 1; \
	  fi; \
	fi; \
	if [ "$(GIT_DIST_TAG)" = "true" ]; then \
          if git show-ref --tags -q $(GIT_DIST_TAG_NAME); then \
	    echo "git distribution tag $(GIT_DIST_TAG_NAME) already exists; not distributing" 1>&2; \
	    exit 1; \
	  fi; \
	fi; \
	echo "$@: distributing $(PACKAGE)-$(VERSION) on git branch $(GIT_DIST_BRANCH)"; \
	DISTDIR_INDEX="$(abs_top_builddir)/$(distdir).git.idx"; \
	DISTDIR_TREE="$(abs_top_builddir)/$(distdir)"; \
	rm -f "$$DISTDIR_INDEX"; \
	GIT_INDEX_FILE="$$DISTDIR_INDEX" GIT_WORK_TREE="$$DISTDIR_TREE" git add -A || exit 1; \
	GIT_INDEX_FILE="$$DISTDIR_INDEX" TREE="$$(git write-tree)"; \
	rm -f "$$DISTDIR_INDEX"; \
	[ -n "$$TREE" ] || exit 1; \
	RELEASE_HEAD="$$(git show-ref -s $(GIT_RELEASE_BRANCH))"; \
	COMMIT_OPTS="-p $$RELEASE_HEAD"; \
	DIST_PARENT="$$(git show-ref --heads -s refs/heads/$(GIT_DIST_BRANCH))"; \
	if [ -n "$$DIST_PARENT" ]; then \
	  COMMIT_OPTS="$$COMMIT_OPTS -p $$DIST_PARENT"; \
	fi; \
	COMMIT="$$(echo $(GIT_DIST_COMMIT_MESSAGE) | git commit-tree "$$TREE" $$COMMIT_OPTS)"; \
	[ -n "$$COMMIT" ] || exit 1; \
	git update-ref "refs/heads/$(GIT_DIST_BRANCH)" "$$COMMIT" "$$DIST_PARENT" || exit 1;\
	echo "$@: tree=$$TREE"; \
	echo "$@: commit=$$COMMIT"; \
	if [ "$(GIT_RELEASE_TAG)" = "true" ]; then \
	  RELEASE_TAG_OPTS=""; \
	  if [ "$(GIT_RELEASE_TAG_SIGN)" = "true" ]; then \
	    RELEASE_TAG_OPTS="$$TAG_OPTS -s"; \
	  fi; \
	  git tag -m $(GIT_RELEASE_TAG_MESSAGE) $$RELEASE_TAG_OPTS "$(GIT_RELEASE_TAG_NAME)" "$$COMMIT" || exit 1; \
	    echo "$@: release tagged as $(GIT_RELEASE_TAG_NAME)"; \
	fi; \
	if [ "$(GIT_DIST_TAG)" = "true" ]; then \
	  DIST_TAG_OPTS=""; \
	  if [ "$(GIT_DIST_TAG_SIGN)" = "true" ]; then \
	    DIST_TAG_OPTS="$$TAG_OPTS -s"; \
	  fi; \
	  git tag -m $(GIT_DIST_TAG_MESSAGE) $$DIST_TAG_OPTS "$(GIT_DIST_TAG_NAME)" "$$COMMIT" || exit 1; \
	    echo "$@: distribution tagged as $(GIT_DIST_TAG_NAME)"; \
	fi;
	$(am__remove_distdir)


-- 
  .''`.  Roger Leigh
 : :' :  Debian GNU/Linux             http://people.debian.org/~rleigh/
 `. `'   Printing on GNU/Linux?       http://gutenprint.sourceforge.net/
   `-    GPG Public Key: 0x25BFB848   Please GPG sign your mail.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/vcs-pkg-discuss/attachments/20090823/1af452b9/attachment.pgp>


More information about the vcs-pkg-discuss mailing list