[Neurodebian-upstream] [Nipy-devel] Standard dataset

Tue Sep 21 23:48:06 UTC 2010

Hi,

On Tue, Sep 21, 2010 at 01:54:09PM -0700, Matthew Brett wrote:
> It seems to me that 3) - the tests - have to be configured by the
> individual software packages.

If they are package-specific unit tests (or regression tests). But we
also want to have comparative tests that test multiple implementations
regarding similar or identical output -- think: 15 DICOM->NIfTI
conversion implementations should ideally all be identical, but right
now are often not even in the same ballpark. That is a meta-test that
lives outside of a single 'upstream' package.

<snip>

> a) The user has the name and version of a data package they want from
> the internet.  They can install a data package matching that name and
> version

Getting it shouldn't be a problem -- although it would need to have ways
for robust distribution. The whole neuroimaging world downloading TB
datasets from a single machine probably doesn't do well.

But real problems starts by lacking a uniform way to install that data
package (in a yet to be determined format).

> b) If the package is installed, there is an algorithm for the system
> to find what package is installed, and the version of that package.

Big problem on a platforms without a 'package manager'. In general,
available once you limit the scope to some form of environment, e.g.
Python, MacPorts, Cygwin, ... But for a global solution this is a major
problem.

> c) The user can install the package into any named location, and tell
> the system where to look for the data package
> d) The user can install the data package as root so that the algorithm
> in b) can find the package
> e) ditto as non-root

I take those as: should be a non-chaotic system. Or did they aim at
something specific.

> f) A user can create their own data package locally

This is a must. Also related to the extensibility of the system.

> g) The user can install their local data package in the same way as a
> remote package

Not sure if it has to be exactly identical -- think: APT vs dpkg. But
that way those 'packages' are 'registered in the system should make no
difference between local and remote.

> h) The user can allow the system to find the local package without
> installation (develop mode)

I don't see how this would work -- or I have a different concept of an
'installation'. If you set the PYTHONPATH to a python module source
tree, you effectively install it (just no copying into system paths).
How could any system know about the presence of a package without
installation?

> i) The user can upload their package somewhere such that another user
> can find the package as in a)

This is a must. Although not anybody may be allowed to upload to any
location (obviously) -- but there has to be a common distribution
format/channel.

> So, actually, our original draft was meant to try and deal with at
> least some of these problems in an language neutral way.  By language
> neutral, I mean, you might need python installed to install the
> package, but you can use the package from any language.

Hmm, so you are aiming at some form of package manager for python?
That would have to be written? How could you implement the link between
data versions and software versions? For example: AFNI as of yesterday,
needs a dataset with NIfTI files that have a new magic header for its
regression tests (hypothetical usecase) -- previous versions would be
fine. That would only work (IMHO) if the regression test is part of AFNI
and AFNI is aware of the data package manager and knows how to make it
get the right data? And it would also be AFNI's duty to do that in a
platform-appropriate way.

For AFNI developers it is probably a lot easier to simply ship the right
dataset with AFNI. At the distribution/integration level, however, we
would have to deal with data duplication, etc.

Michael

-- 
GPG key:  1024D/3144BE0F Michael Hanke
http://mih.voxindeserto.de