[pymvpa] advice for constructing a dataset for use in pyMVPA

Thu Feb 12 16:09:36 UTC 2009

Thank you very much for your helpful advice and comments!

>> are you doing cross-fold validation?  if so, using subjects as chunks 
>> would mean you train the classifiers on one set of subjects and use it 
>> to classify the one(s) left out.  typically you don't want to do this, 
>> so i would say keep each subject as a completely separate data set and 
>> use runs as chunks, then decide how you want to compare the output over 
>> all subjects (ie, t-test, whathaveyou...)
> 
> setup of the analysis depends on what is actually believed to be an
> effect -- ie if it is as in GLM -- uniform activations of the areas,
> then processing full dataset (all subjects) holding a subject out for
> cross-validation might be quite reasonable. If effects are local
> distributed codes, then imho I would start processing each subject
> separately, hence holding out a single run for cross-validation.  At the
> end it is possible to just add up all confusion matrices, to get a
> 'summary' confusion matrix across all subjects
> 
I analyze this data at both the single-subject level and between 
subjects (for different purposes). So it sounds like when I want to test 
between subjects I should chunk the subjects (or concatenate as you 
mentioned), otherwise the blocks or runs, depending on what I want to 
use for cross-validation.

>>> "conf", etc.). I never need to classify on more than one text label at a 
>>> time (i.e. just "color", not "color" and "conf"), though I do need to 
> so do you do 1-class classification? libsvm has that facility but I've
> never used it yet within PyMVPA, so not sure on side-effects ;-)
> or did you mean "color" - vs - all_other_labels?
> 

No - I was unclear. These are all binary classifications (e.g. color can 
be red or blue, conf N or R).

>>> In this case, I think that the "samples" are my blocks
> do you mean that you want to look at average activation within a block
> as a single sample?

Yes; I already processed my data to have one (analyze) image per block 
per subject.

>>> 4 - write python code to create my NiftiDataset object, using my analyze 
>>> image (0 for voxels to exclude >0 for voxels to include) as a "mask" if 
>>> I want to restrict my analysis to those voxels.
> you can load Analyze image as easily with NiftiDataset as nifti files,
> so no need to manually convert it, BUT it might be desired to convert
> mask and all data to nifti first and see if they remain correct (ie no
> evil things due to incorrect orientation, order of slices etc)
Good advice!