Thanks Yaroslav! The previous results make sense now. <br><br>I have a related question: After feature selection on totally random samples, my binary classification accuracy was significantly better than chance (50%). For MVPA with feature selection on real fMRI data, how do we know better-than-chance performances reflect true effects or just artifacts from feature selection?<br>

<br>My code with feature selection is listed below:<br><br>from mvpa2.suite import *<br>fsel=SensitivityBasedFeatureSelection(OneWayAnova(),FixedNElementTailSelector(25,mode='select',tail='upper'))<br>clf = LinearCSVMC();<br>

cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))<br>cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))<br>acc_chunks=[]<br>acc_events=[]<br>for i in range(100):<br> print i<br>

 ds=Dataset(np.random.rand(200,100)) <br> <a href="http://ds.sa">ds.sa</a>['targets']=np.remainder(range(200),2) <br> <a href="http://ds.sa">ds.sa</a>['events']=range(200)<br> <a href="http://ds.sa">ds.sa</a>['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))<br>

 fsel.train(ds)<br> ds=fsel(ds)<br> ds_chunks=cv_chunks(ds)<br> acc_chunks.append(1-np.mean(ds_chunks))<br> ds_events=cv_events(ds)<br> acc_events.append(1-np.mean(ds_events))<br><br>>>>print np.mean(acc_chunks), np.std(acc_chunks)<br>

0.6366 0.0350633712013<br><br>>>>print np.mean(acc_events), np.std(acc_events)<br>0.6405 0.0350820466906<br><br>Thanks!<br>Dale<br><br><div class="gmail_quote">On Fri, Apr 20, 2012 at 12:34 PM, Yaroslav Halchenko <span dir="ltr"><<a href="mailto:debian@onerussian.com">debian@onerussian.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">if we were to talk about bias we would talk about classification of true<br>

effects ;)<br>

<br>

you are trying to learn/classify noise on disbalanced sets -- since you<br>

have 'events' == range(200), each sample/event is taken out<br>

separately you have 100 of one target (say 1) and 99 of the other (say<br>

0).  Since it is a pure noise, classifier might choose just say that it<br>

is the target with majority samples (regardless of the actual data which<br>

is noise) since that would simply minimize its objective function during<br>

training.  As a result you would get this "anti-learning" effect.  That<br>

is why we usually suggest to assure to have equal number of samples per<br>

each category in training set (or chain with Balancer generator if data<br>

is disbalanced).<br>

<div class="im"><br>

On Fri, 20 Apr 2012, Ping-Hui Chiu wrote:<br>

<br>

>    Dear PyMVPA experts,<br>

>    Isn't a leave-one-out cross-validation supposed to produce a smaller bias<br>

>    yet a larger variance in comparison to N-fold cross-validations when N<#<br>

>    of samples?<br>

<br>

>    I ran a sanity check on binary classification of 200 random samples.<br>

>    4-fold cross-validations produced unbiased estimates (~50% correct),<br>

>    whereas leave-one-out cross-validations consistently produced<br>

>    below-than-chance classification performances (~40% correct). Why?<br>

<br>

>    Any insight on this will be highly appreciated!<br>

<br>

>    My code is listed below:<br>

<br>

>    from mvpa2.suite import *<br>

>    clf = LinearCSVMC();<br>

>    cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))<br>

>    cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))<br>

>    acc_chunks=[]<br>

>    acc_events=[]<br>

>    for i in range(200):<br>

>    �print i<br>

>    �ds=Dataset(np.random.rand(200))<br>

</div>>    �[1]<a href="http://ds.sa/" target="_blank">ds.sa</a>['targets']=np.remainder(range(200),2)<br>

>    �[2]<a href="http://ds.sa/" target="_blank">ds.sa</a>['events']=range(200)<br>

>    �[3]<a href="http://ds.sa/" target="_blank">ds.sa</a>['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))<br>

<div class="im">>    �ds_chunks=cv_chunks(ds)<br>

>    �acc_chunks.append(1-np.mean(ds_chunks))<br>

>    �ds_events=cv_events(ds)<br>

>    �acc_events.append(1-np.mean(ds_events))<br>

<br>

>    >>>print np.mean(acc_chunks), np.std(acc_chunks)<br>

>    0.50025 0.0442542370853<br>

>    >>>print np.mean(acc_events), np.std(acc_events)<br>

>    0.40674 0.189247516232<br>

<br>

>    Thanks!<br>

>    Dale<br>

<br>

</div>> References<br>

<br>

>    Visible links<br>

>    1. <a href="http://ds.sa/" target="_blank">http://ds.sa/</a><br>

>    2. <a href="http://ds.sa/" target="_blank">http://ds.sa/</a><br>

>    3. <a href="http://ds.sa/" target="_blank">http://ds.sa/</a><br>

<br>

> _______________________________________________<br>

> Pkg-ExpPsy-PyMVPA mailing list<br>

> <a href="mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org">Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org</a><br>

> <a href="http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa" target="_blank">http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa</a><br>

<span class="HOEnZb"><font color="#888888"><br>

<br>

--<br>

=------------------------------------------------------------------=<br>

Keep in touch                                     <a href="http://www.onerussian.com/" target="_blank">www.onerussian.com</a><br>

Yaroslav Halchenko                 <a href="http://www.ohloh.net/accounts/yarikoptic" target="_blank">www.ohloh.net/accounts/yarikoptic</a><br>

<br>

_______________________________________________<br>

Pkg-ExpPsy-PyMVPA mailing list<br>

<a href="mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org">Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org</a><br>

<a href="http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa" target="_blank">http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa</a></font></span></blockquote></div><br>