[pymvpa] using Balancer()

Tren Huang tren.huang at gmail.com
Thu Jul 26 14:53:04 UTC 2012


> > Also, what happens if in some runs, I have no trials for a certain
> > condition? I imagine that on folds where those runs are part of the "test
> > dataset", this would be problematic (nothing to test against)?  So are
> > those fold entirely excluded from the analysis if I use Balancer?
>
> I don't think it would exclude any runs entirely, but you would be
> lacking something to test on, indeed.
>

Just add to Michael's reply. I think what happens is like the following
situation. Please correct me if I'm wrong.

Say you have 5 samples in total:

Category A: samples #1, #2
Category B: samples #3, #4
Category C: sample #5

During leave-one-out cross-validation, each sample would be picked up as
the one for testing:

When sample #1 is the one, the Balancer would either choose (#2, #3, #5) or
(#2, #4, #5) for training. Chance level = 100%/3= 33.3% correct.

When sample #5 is the one, the Balance would choose among (#1, #3), (#1,
#4), (#2, #3), (#2, #4) for training. Chance level = 100%/2= 50% correct
but you actually get 0% classification accuracy because the trained
classifier always predicts either category A or B but the ground truth is
category C.

Because the chance level may change in an unbalanced dataset like this,
running the same procedure with the same category labels on white noise may
provide you a better estimate of chance classification performance.

Tren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20120726/da2b978b/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list