[Debian-med-packaging] metastudent_1.0.7-1_amd64.changes REJECTED

"Steffen Möller" steffen_moeller at gmx.de
Wed Apr 24 08:52:16 UTC 2013


Dear all,

For theoretical computational biologists, the availability of large data sets are non-issues. They have all the freedom to just generate their very own - following some distribution observed on real data, no matter how useless the data may be beyond a publication of their method. In that sense, I am much after accepting everything in main that works on biological or synthesized biological data, if data availability is the only concern, and if the package builds without direct access to it.

For practical purposes, and here Debian Med really thrives, we really need to get our heads together on these data questions. We have two tools for auto-installing data in our repository, mine+Charles' is getData, Olivier's ist BioMaj. Would any automation of the data preparation provided by either of such tools would be sufficient? It would still leave the package non-DFSG-free since we need the InterNet, but by all means, anybody active in computational biology has with some good confidence also net access. To give another example: BLAST, with nobody sensibly wants outside of main, also works only with data. And it is (by some groups) used on tens of gigabytes. We can certainly come up with scenarios of Tobias' tool that require less than he has already invested in.

I suggest to accept the package in main.

Cheers,

Steffen


Gesendet: Samstag, 20. April 2013 um 18:01 Uhr
Von: "Laszlo Kajan" <lkajan at debian.org>
An: FTPMaster <ftpmaster at debian.org>, "Debian Med Packaging Team" <debian-med-packaging at lists.alioth.debian.org>
Cc: "Tobias Hamp" <hampt at rostlab.org>
Betreff: Re: [Debian-med-packaging] metastudent_1.0.7-1_amd64.changes REJECTED
Dear Team, FTP Masters, Luca!

How do we handle packages that depend on large data for operation? See below.

On 19/04/13 17:00, Luca Falavigna wrote:
>
> Hi,
>
> according to README.Debian, this package requires the download of an
> external resource to work properly, so it must be targeted contrib.

I have a free gene ontology term predictor 'metastudent' from Tobias Hamp. It searches BLAST databases that were specially prepared for it.
These databases and some additional data files are packed up in a free (GPL-2+) tar.gz [1] that is over 400MB. In order to save space, we
decided not to package and upload the data (after initially packaging it). That now seems to force the package to 'contrib' [2].

[1] ftp://rostlab.org/metastudent/metastudent-data_1.0.0.tar.gz
[2] http://www.debian.org/doc/debian-policy/ch-archive.html 2.2 Archive areas

* I doubt the package contains everything needed to /generate/ those data files (@Tobias: does it?).

* Without the data files the package would not be broken, but it would not be useful, it would not perform its function.

* predictprotein, by the way, could also not perform its function fully, and would not be useful, without large BLAST and other databases (that
have to be downloaded [3]). predictprotein does have all the tools and instruction packaged to obtain the required databases, though.

[3] http://wiki.debian.org/DebianMed/PredictProtein[http://wiki.debian.org/DebianMed/PredictProtein]

My question is:

* Do we have a team policy for such packages that depend on large data? Where should the data go?

* Should metastudent, and consequently predictprotein, go into 'contrib'?

* Do you see a way - apart of creating a 400MB package for the metastudent data, and a several gigabytes large for predictprotein - to keep
these in 'main', and therefore in the distribution?

There's been a discussion about this issue during the DPL vote [4], whether software that is not useful without Internet connection could be in
'main'. Bart Martens suggested the interpretation that if a package installs software outside the distribution on the local system, then it
should not be in 'main' [5]. Russ Allbery wrote that point #1 of the social contract is relevant (and canonical) [6]. I interpret point #1 as
'as long as no non-free software is installed on the system by the package, a package can be part of the Debian system'.
The social contract [7] point 1. indeed seems to allow metastudent to be in 'main', in my interpretation, provided the data is DFSG free (I
think it is).

[4] http://lists.debian.org/debian-vote/2013/03/msg00249.html[http://lists.debian.org/debian-vote/2013/03/msg00249.html]
[5] http://lists.debian.org/debian-vote/2013/03/msg00276.html[http://lists.debian.org/debian-vote/2013/03/msg00276.html]
[6] http://lists.debian.org/debian-vote/2013/03/msg00279.html[http://lists.debian.org/debian-vote/2013/03/msg00279.html]
[7] http://www.debian.org/social_contract[http://www.debian.org/social_contract]

Your thoughts and suggestions are welcome.

Best regards,
Laszlo

_______________________________________________
Debian-med-packaging mailing list
Debian-med-packaging at lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging[http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-med-packaging]



More information about the Debian-med-packaging mailing list