[pymvpa] Biased estimates by leave-one-out cross-validations in PyMVPA 2

Yaroslav Halchenko debian at onerussian.com
Fri Apr 20 18:34:47 UTC 2012


if we were to talk about bias we would talk about classification of true
effects ;)

you are trying to learn/classify noise on disbalanced sets -- since you
have 'events' == range(200), each sample/event is taken out
separately you have 100 of one target (say 1) and 99 of the other (say
0).  Since it is a pure noise, classifier might choose just say that it
is the target with majority samples (regardless of the actual data which
is noise) since that would simply minimize its objective function during
training.  As a result you would get this "anti-learning" effect.  That
is why we usually suggest to assure to have equal number of samples per
each category in training set (or chain with Balancer generator if data
is disbalanced).

On Fri, 20 Apr 2012, Ping-Hui Chiu wrote:

>    Dear PyMVPA experts,
>    Isn't a leave-one-out cross-validation supposed to produce a smaller bias
>    yet a larger variance in comparison to N-fold cross-validations when N<#
>    of samples?

>    I ran a sanity check on binary classification of 200 random samples.
>    4-fold cross-validations produced unbiased estimates (~50% correct),
>    whereas leave-one-out cross-validations consistently produced
>    below-than-chance classification performances (~40% correct). Why?

>    Any insight on this will be highly appreciated!

>    My code is listed below:

>    from mvpa2.suite import *
>    clf = LinearCSVMC();
>    cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))
>    cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))
>    acc_chunks=[]
>    acc_events=[]
>    for i in range(200):
>    �print i
>    �ds=Dataset(np.random.rand(200))
>    �[1]ds.sa['targets']=np.remainder(range(200),2)
>    �[2]ds.sa['events']=range(200)
>    �[3]ds.sa['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))
>    �ds_chunks=cv_chunks(ds)
>    �acc_chunks.append(1-np.mean(ds_chunks))
>    �ds_events=cv_events(ds)
>    �acc_events.append(1-np.mean(ds_events))

>    >>>print np.mean(acc_chunks), np.std(acc_chunks)
>    0.50025 0.0442542370853
>    >>>print np.mean(acc_events), np.std(acc_events)
>    0.40674 0.189247516232

>    Thanks!
>    Dale

> References

>    Visible links
>    1. http://ds.sa/
>    2. http://ds.sa/
>    3. http://ds.sa/

> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa


-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic



More information about the Pkg-ExpPsy-PyMVPA mailing list