How to cope with patches sanely

Mon Feb 25 09:33:48 UTC 2008

On Mon, Feb 25, 2008 at 03:37:07AM +0000, Manoj Srivastava wrote:
> On Sun, 24 Feb 2008 21:17:10 -0500, David Nusinow <dnusinow at speakeasy.net> said: 
> 
> > On Sun, Feb 24, 2008 at 06:08:17PM -0800, Russ Allbery wrote:
> >> David Nusinow <dnusinow at speakeasy.net> writes:
> >> 
> >> > The problem is that you and Manoj assume that this is the only way
> >> > to do things. I don't believe this. Pierre Habouzit has been
> >> > experimenting with an alternative method of feature branches that
> >> > exports to a linear stack of diffs just fine. Just because Manoj is
> >> > doing something one way right now doesn't mean it's the only or
> >> > even the correct way to do it.
> 
>         I would be interested in details of this, and whether this
>  approach works with pure feature branches where the features are being
>  developed contemporaneously with each other an upstream development;
>  and thus the branches overlap both temporally and in code space.

  I'm planning to write a textual version of what I demonstrated at
FOSDEM, with some more ideas that I had talking with Julien Cristau on
the grass after.

  You developped them contemporaneously, okay, but in the end you merge
them one after the other. If you're doing criss-cross merges, well, I
can nothing for you, and you're creating really messy histories, and
yes, you need an SCM to represent that in a satisfying way.

  But if you really merge one feature branch on top of the other, and
it's in my experience *way enough* for what we need in Debian packaging,
then multiple branches are just multiple series to be applied in a
specific relative order.

  The problem is, I believe that there are two kind of things: patches
that you backport from upstream. If you're lucky enough to have an
upstream using git, it's just a matter of merging the stable branch into
yours, or cherry-picking some patches, which will not create conflicts
when you merge back. This goes in the .diff.gz, and it's okay (at least
I think so) because it's patches that _everyone_ can take from upstream
as well. You don't need to make them special, and it's always possible
to generate some kind of flat file to say, okay, I cherry-picked this
patch this patch and this patch from upstream, or merged up to this
point of this upstream branch. This information is more useful than the
patch series for derived distros, or co-maints.

  When it comes to specific patches of yours, I really believe that
topic branches like you advertise them are the best answer. Git makes
merging easy (s/Git/reasonnable $DSCM/ for this matter btw) in the sense
that merging is fast enough, and easy enough when the branches you merge
have not diverged too much. I mean, no matter which SCM you use, merging
from a branch that is _very_ old, and still not merged upstream is jut a
pain. And it's again not an SCM issue. A patch queue _is_ a branch in
itself. Really. There are two ways to look at that. Either you say, I
always want to remember I started from this point, and then you merge
and merge and merge, and your history looks like that:

R are uptream releases, M your repeated merges to keep the feature
branch current.

-o---o---o--[...]--R--[...]--R--[...]--o--
  \                \         \
   p--p--p--p-------M---------M----...[feature branch]

  Well with this approach, upstream will have to take a messy history
with a _lot_ of merge points they don't care about, and won't be able to
try your feature branch on top of their current work and maybe
eventually adopt it. And worse, if you have to add new patches along the
way, you get an history with a mixed suite of patches and merges, which
is unreadable to upstream.

  The other way is to forget about giving depth *in* the SCM to the
patches history. Because it's what it's about. What you really want
IMSHO is: I have this patch queue [pq] and at upstream R0 it was in
state pq0, in upstream R1 it was in state pq1 and so on. Without any
useless merge points in it. This way your feature branch is a free (as
in only attached to history by its base) branch that you rewrite for
each new upstream, serialize under debian/patches/<featurebranch>. In
git, there is this awesome git-quiltimport command that allow you to
rebuild a branch from a quilt series in a trivial way. If you want to
work on the patch queue, you just need to make it a branch again, do
your stuff, serialize it again, and you're done.

  While doing that, your workflow allow people to do meaningful changes
to your package (by adding patches to a given queue), that you'll
transparently *painlessly* import into your workflow. Whereas with your
current one, you'll have to extract whatever the NMUer did that is a
flat debdiff, and split it. It's horrible for you, don't please pretend
otherwise, I won't believe you. The other gain, is that upstream can
look at a current, unencumbered patch queue about the feature you added,
and can take a decent decision about the fact that it's good to take
upstream or not, and it's trivial to export such a branch to upstream:

  http://git.madism.org/?p=packages/tokyocabinet.git;a=shortlog;h=upstream%2Bpatches

  the patches between the [upstream] and [upstream+patches] stickers are the
patches I apply. Upstream doesn't need to grok a single word of git to
get the patches back. *I* give back in a readable way, with comments and
decent history, without having to do anything special for it.

  Last but not least, what I recommend here for packaging would be a
complete hell if you diverge a _lot_ from your upstream. But if you
diverge a lot, then you should rather officially start a fork, and do
your feature branches and stuff here, because for long features and
complicated ones I _do_ agree that it's the sole approach that makes
sense, and then you'll package the fork. Of course, I don't pretend that
for patch queues with 100 patches in it, rebasing is nice. It isn't. But
I also assumed that you never have 100-patches-long features branches at
all.

> > Ok, that's fair. In the worst case then people who want to use this
> > sort of workflow could stick everything in a giant diff like we do
> > now, so nothing would be lost.
> 
>         Or have dpkg understand not just quilt, but git.

  There is little point in that, you should read git history (on LWN
IIRC, or maybe it was his google talk), where linus explains the
exchange format he designed to have a gateway between bk and $scm. It
was basically just flat patches. The sole thing that flat patches can't
represent nicely are merge points, but you can take the simple approach
of doing that:

debian/patches/<feat-branch-1>/
debian/patches/feat1-to-feat2.patch
debian/patches/<feat-branch-2>/
debian/patches/feat2-to-feat3.patch
debian/patches/<feat-branch-3>/
debian/patches/feat3-to-feat4.patch
...

  With the semantics that you want to apply branch1, then branch2, then
branch3, and so on. And you can have in between feati-to-feati+1.patch
patches to pre-resolve conflicts, which keeps your branches more
independent one relatively to the other, but allowing to apply them to
the source package in a linear way (it doesn't mean that you have to
have them one after the other in your SCM, you just fake the merges a
bit differently).

  FWIW I'm willing to write a git-quiltexport tool generating basically
those kind of stuff for you.

-- 
·O·  Pierre Habouzit
··O                                                madcoder at debian.org
OOO                                                http://www.madism.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.alioth.debian.org/pipermail/vcs-pkg-discuss/attachments/20080225/2d86f086/attachment.pgp