[pymvpa] [mvpa-toolbox] Below chance performance

Thu Feb 12 03:33:10 UTC 2009

imho, various factors can attribute to such behavior, but here
unintentionally yoh hit interesting issue of estimating "by-chance"
performance distribution...

as you can see it is probably quite different from what people take as
the one under assumption of bernoulli trials, which is actually
different from what we observe with classifiers, performance of which
depends on additional factors (e.g. distribution of the data, schedule
of generalization estimation, optimization stopping, inherent feature
selection if any present, etc).  I am mentioning this just to raise
critical awareness of the results in various journals where authors get
0.6 correct performance on two classes and claim 'presence of the
signal', since it is significantly different from 'by chance'
performance.  I, personally, would be ashamed to publish and draw
conclusions after obtaining such a performance.

Permutation testing to assess significance of the result is often
capable to address this issue to some degree, but not always (once again
depending on the distributions of the data, generalization
estimation etc).

Also, as one of the examples where misclassification (or in other words
-- anti-learning effect) can be a result, look at

An Analysis of the Anti-Learning Phenomenon
for the Class Symmetric Polyhedron
Adam Kowalczyk and Olivier Chapelle
http://eprints.pascal-network.org/archive/00001154/01/alt_05.pdf

but I doubt that it is the case in your data

P.S. I am doing evil and cross-posting to a pymvpa mailing list since
this disccussion might be of interest to its readers as well ;-)

On Wed, 11 Feb 2009, Paul Krueger wrote:

>    Has anyone experienced getting below chance performance?  I'm
>    consistently getting below chance performance, even if I feed random
>    noise as data.  I thought there might be a problem with our code that
>    made regressors, which could have explained below chance performance on
>    our data (and could also conceivably account for below chance
>    performance with random noise if the gnb pre-classifier found some
>    patterns by chance and then our regressor code somehow messed up
>    training/testing of those patterns).  But our regressor code is fine
>    now, and when I feed random noise I use a tiny 5x5x5 mask (which is
>    very unlikely to have chance patterns in it), so I really can't
>    understand how we could be getting consistently below chance.  Another
>    strange thing is the amount of variation in performance across
>    iterations-- e.g. if chance=0.25 I might sometimes observe performance
>    values across 12 iterations ranging from 0.0 to 0.5!  It is somewhat
>    unclear whether this variation changes, depending on the data I use, if
>    I change the max number of epochs.
>    Does anyone have any ideas about what might be causing this?
>    Paul
>    --~--~---------~--~----~------------~-------~--~----~
>    You received this message because you are subscribed to the Google
>    Groups "Princeton MVPA Toolbox for Matlab" group.
>    To post to this group, send email to mvpa-toolbox at googlegroups.com
>    To unsubscribe from this group, send email to
>    mvpa-toolbox+unsubscribe at googlegroups.com
>    For more options, visit this group at
>    http://groups.google.com/group/mvpa-toolbox?hl=en
>    -~----------~----~----~----~------~----~------~--~---

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik