patches-applied historical imports (usd import)

Nish Aravamudan nish.aravamudan at canonical.com
Fri Jan 6 22:26:29 UTC 2017


Hi Ian,

I've got a few questions/comments about git-based publishing imports and
history for patches-applied imports that I was hoping to bounce off you
and other VCS folks. I apologize for the very long e-mail to follow!

For context for everyone: I (along with Robie Basak and others) have
been developing an importer that will take the publishing history of a
source package and import the source package contents into a git tree,
with tags for each publish and branches for each 'series'. That's a
broad gloss of the details, of course, but probably sufficient. This
could be, I think, of use to both Ubuntu and Debian. If you're not
interested in this, though, please feel free to disregard :)

On to the questions/comments:

1) Some source packages (bouncycastle, php7.0 are the ones I can think
of off the top of my head) upstream tarballs contain .gitattributes
files, which will change the behavior of git itself when checking out a
branch. This is not by definition a problem, except that to get to a
fully patches-applied state, I believe you must be checkout an
appropriate commit to be at (meaning you're adjusting the working
directory's contents) -- which may then differ from what is shipped by
the upstream tarball. I have seen this with eol adjustments, and much
more annoyingly (because git knows it is doing it, while vi/emacs do
not) the special ident handling. For now, we've added the following to
our git repository's .git/info/attributes at checkout time (using our
wrapper to `git clone`):

* -ident
* -text
* -eol

In other words, the underlying issue is that the upstream uses git as
well, and their git 'configuration' (not necessarily just .gitconfig)
will interfere with the behavior of any git using the upstream contents
in the working tree. Does the above seem reasonable? Afaict, there is no
way for me to really enforce the above in the repository itself, without
patching the upstream source.

2) How do we determine if a source package is 1.0 vs. 3.0? I am
currently using `dpkg-source --print-format`, but have found one source
package (util-linux 2.13~rc3-5), where dpkg-source emits:

syntax error in /tmp/tmp3y515osf/util-linux-2.13~rc3/debian/control at
line 14: duplicate field Depends found

and thus we error out.

3) Imagine the following graph in the git repository:

   A           D
 ->o.....f....>o->
   ^ .       . ^
   |  c     e  |
   o   .   .   o
   ^    . .    ^
   |     B     |
   o     o     o
   ^     ^     ^
   |     |     |
 ->o---->o---->o-> 
   a     b     d

Each o is a commit in the repository

a, b, d are patches-unapplied imports of publishes for a given release,
which are on a fast-forwarding, branch-unnapplied branch.

A is the corresponding patches-applied import of a (with each o
reflecting one patch application from the source package).

D is the corresponding patches-applied import of d (with each o
reflecting one patch application from the source package).

c, e and f are for demonstration purposes and do not necessarily exist
(except as discussed below).

Ideally, we'd have a fast-forwarding branch for the patches-applied
imports, as well.

Let's assert that there is a problem with obtaining the patches-applied
version of b. This can occur for (at least) the following reasons:

i) as in 2), we might not be able to determine the source package format
(implementation detail, to some degree), so are unable to correctly
derive if there is a patches-applied state that is distinct from b.

ii) some patches may fail to apply with a trivial `quilt push`. This
occurs with, at least, a historical publish of samba.

In theory, there are other reasons/cases where this might happen and the
importer needs to never fail (so it is of some use to run automatically
:)

The questions I have relate to what to do when we encounter this
situation, which in turn is divided into two parts:

i) Do we want to 'tag' this failed-to-apply patches-applied import in
the repository (currently, every successful patches-applied import is
tagged as 'applied/<dep14 of the published version>')? This is important
for semantics for end-users (and the importer itself).
  - We assert, currently that tagged objects in the tree correspond to
    the source package as published in Launchpad/archive. Tagging this
    failed-to-apply state as that, would violate that assertion.
  - We also rely (implementation detail) on being able to find 'nearest'
    publishes by tag name, which sort of leads into the next issue...

ii) What should happen to the branch?
  - if the branch is left at A, then (even if only momentarily), upon
    finishing the import at B, the branch does not reflect the latest
    state in Launchpad relative to the importer's progress. 
  - if the branch is left at B, then it is no longer fast-forwarding, as
    there is no connection between A and B.
  - if the branch is left at B, and we add the connection c to make it
    fast-forwarding, we violate a different semantic we assert about
    parenting relationships in our repository: namely, that a commit
    contains everything in all of its parents. Equally so, it doesn't
    really make sense to put c in, as B does not represent a fully
    patches-applied import, while A does.

Let's presume that this failure is not persistent and that D is able to
be imported successfully, we again have to make decisions about
parenting. I think it only makes sense for one of c or f to exist, based
upon what we decide is the right policy above.

In sum, I think we have one of two/three options:

1) When we encounter a failure to derive a patches-applied import of a
publish, whatever the reason, we do nothing with that treeish. It will
not be present in the history of any branches or even be tagged.
  1a) Slightly less extreme, we will not place it in the history, but we
  will tag it as 'broken/applied/<dep14 of version>' or so?
2) We will always treat patches-applied as 'best-effort', so that, for
instance, if we do fail to apply all the patches for a given publish, we
will simply tag the last succesful application as the patches-applied
import. The parenting relationships for the applied branches will not
have the same meaning as the unapplied branches, but will always exist.

There might be other options than these, but it's what I've come up with
so far. Any comments, suggestions and/or feedback would be greatly
appreciated!

Thanks!
-Nish

-- 
Nishanth Aravamudan
Ubuntu Server
Canonical Ltd



More information about the vcs-pkg-discuss mailing list