[pymvpa] unbalanced datasets

Edmund Chong edmund.w.chong at gmail.com
Tue Aug 7 15:52:45 UTC 2012


Hi all,

I recently asked a question on dealing with unbalanced datasets and here's
a follow-up question.
So let's say I have empty runs, or runs where there are zero samples for
one of the conditions. This leads to problems if that run happens to be the
test run on a leave-one-run-out cross-validation procedure.

My workaround for that was this: if I had one of such runs with empty
conditions, then I would set NFoldPartitioner(cvtype=2), together with
Balancer() so that any combination of two runs would have at least one
sample per condition. But if I had two of such runs with empty conditions,
then I would set cvtype=3, and so on. However this means I have less data
for the training set on each classification fold.

Is there any other possible solution for this? In fact, is it possible to
do leave-n-samples-out classification: So on each fold I randomly select n
samples per condition to test on, and use the remaining samples (after
balancing) for training, disregarding the chunks structure?

Thanks!
-Edmund
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20120807/ef77f5aa/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list