[pymvpa] Balancing strategy

Sat Dec 7 02:26:55 UTC 2013

I don't fully understand what you're asking about; to summarize: you want 
to do leave-one-run-out cross-validation. You don't want to include all 
scans, because of poor image quality. But omitting scans causes imbalance.

I think your concern is that you end up with unequal numbers of examples of 
each task type in each run after getting rid of the bad images? If the 
imbalance isn't too bad (e.g. 10 examples of one class in the run, 8 of the 
other), my usual strategy is to subset the larger class (e.g. only using 8 
of the 10 examples). Since there are many ways to do the subsetting, I 
usually suggest doing 10 different random subsets (e.g. examples 
c(1:6,9,10); 2:9) and averaging over the subsets. But if the imbalance is 
quite bad (e.g. only 1 or 2 of examples left of a class in a run) I 
sometimes change the partitioning (e.g. 
leave-two-sequentially-presented-runs-out) to get the balance a bit closer.

Not hard-and-fast rules, but I hope it helps,
Jo

Sent with AquaMail for Android
http://www.aqua-mail.com