[Pkg-exppsy-pynifti] Example data - a proposal

Matthew Brett matthew.brett at gmail.com
Sat Jul 11 18:28:44 UTC 2009


Hi,

As we're working on stuff, the problem of example data keeps coming
up.  We often need example data for

1) tests
2) examples

Of course we're often using images and these can be rather large.

The options are:

a) Download data from the web during test / example as they are used,
maybe with local caching (and some hash check for remote changes)
(something like what we have).
b) Check the data into the main code repository, so that it always
comes with the code
c) Put the data into another package and release it separately.

The problems that we have to address:

i) What to do about people with limited internet connectivity
(temporary or permanent)
ii) How to deal with changes in the example / data files in a way that
is efficient for bandwidth and time
iii) Keeping track of data versions as code and data evolve.

In practice, of course we'll need all three.  Thus, something that is
huge, like the FIAC dataset, are probably impractical to package and
require something like a).   Images which are used daily in
processing, like templates, should be in the repository  (b).  The
question is - what should the default be, when you want people to use
a particular smallish example image?

I'm proposing c) as the best way to go.  In particular, we can version
the data release so that examples and tests can test the version
number rather than file hashes and other painful things.  It avoids
having to keep large binary files in a code repository, making the
repository relatively light to checkout and diff.  It should be simple
to package and distribute.  It gives us a very simple example case to
get our package and release strategy in order.

 I'm already using this when working with pynifti, where I made a
python package containing only data, and then do this at the top of my
(MINC reading) test:

try:
    import imagedata
except ImportError:
    decimg = dec.skipif(True, 'no imagedata package on python path')
else:
    decimg = lambda x : x

So, I propose we collect a carefully selected set of images we want to
use for examples and testing, package them in a trivial python package
something like this:

imagedata/
|-- imagedata
|   |-- __init__.py
|   |-- minc
|   |   `-- avg152T1.mnc
|   `-- nifti
|       `-- avg152T1.nii.gz
|-- setup.py
`-- setupegg.py

and use this package as a dry run for our packing and release mechanisms.

Any thoughts from the team(s)?

See you,

Matthew



More information about the Pkg-exppsy-pynifti mailing list