[pymvpa] Train and test on different classes from a dataset

Yaroslav Halchenko debian at onerussian.com
Thu Jan 31 20:55:18 UTC 2013


On Thu, 31 Jan 2013, J.A. Etzel wrote:
> >On Thu, 31 Jan 2013, J.A. Etzel wrote:

> >There it was not only about "test set" but about the "whole-dataset"
> >(i.e.  traing+test sets).

> I think that's what I mean as well - permute both the training and
> testing set labels.

> Why not, say, if partitioning on the runs, randomize the labels
> within each run then perform the classification? That is, the labels
> are permuted in the entire dataset (within each run, since that's a
> meaningful subdivision/source of variance), then the permuted-label
> dataset is treated (i.e. same partitioning/classification) in the
> same was as the real data?

that example in the documentation permutes the "whole-dataset"
without any partitioning on the runs.  And yes -- I think it should be
sufficient to "permute" in the entire dataset if you do permutation
within runs (i.e. not breaking any balance of labels across runs) AND
maintaining dependence between those samples in each run if you have
more than 1 sample of a class per run.

I.e. whatever samples in the run were of the same category, they better
stay of the same (possibly different) category since they are not really
independent of each other.  This can be achieved through permutation of
categories among themselves (i.e. if it is a binary task, for a given
run just to decide either to swap the labels or not to swap).  Sure
thing this would not work well if you have only few functional runs, but
being this conservative should eliminate any possible optimistic bias
coming from disregarding possible samples' dependences caused not only
by objective categorical information (but HRF -- e.g. consecutive
volumes in a block-design).

For runs with only a single sample per each category -- it is identical
to regular permutation with limit='chunks'.  For other cases, in
PyMVPA we have such permutator only in GIT, not yet released,
documentation will be adjusted accordingly upon next release ;) 

-- 
Yaroslav O. Halchenko
Postdoctoral Fellow,   Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        



More information about the Pkg-ExpPsy-PyMVPA mailing list