Thanks Yaroslav! The previous results make sense now. <br><br>I have a related question: After feature selection on totally random samples, my binary classification accuracy was significantly better than chance (50%). For MVPA with feature selection on real fMRI data, how do we know better-than-chance performances reflect true effects or just artifacts from feature selection?<br>
<br>My code with feature selection is listed below:<br><br>from mvpa2.suite import *<br>fsel=SensitivityBasedFeatureSelection(OneWayAnova(),FixedNElementTailSelector(25,mode='select',tail='upper'))<br>clf = LinearCSVMC();<br>
cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))<br>cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))<br>acc_chunks=[]<br>acc_events=[]<br>for i in range(100):<br> print i<br>
ds=Dataset(np.random.rand(200,100)) <br> <a href="http://ds.sa">ds.sa</a>['targets']=np.remainder(range(200),2) <br> <a href="http://ds.sa">ds.sa</a>['events']=range(200)<br> <a href="http://ds.sa">ds.sa</a>['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))<br>
fsel.train(ds)<br> ds=fsel(ds)<br> ds_chunks=cv_chunks(ds)<br> acc_chunks.append(1-np.mean(ds_chunks))<br> ds_events=cv_events(ds)<br> acc_events.append(1-np.mean(ds_events))<br><br>>>>print np.mean(acc_chunks), np.std(acc_chunks)<br>
0.6366 0.0350633712013<br><br>>>>print np.mean(acc_events), np.std(acc_events)<br>0.6405 0.0350820466906<br><br>Thanks!<br>Dale<br><br><div class="gmail_quote">On Fri, Apr 20, 2012 at 12:34 PM, Yaroslav Halchenko <span dir="ltr"><<a href="mailto:debian@onerussian.com">debian@onerussian.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">if we were to talk about bias we would talk about classification of true<br>
effects ;)<br>
<br>
you are trying to learn/classify noise on disbalanced sets -- since you<br>
have 'events' == range(200), each sample/event is taken out<br>
separately you have 100 of one target (say 1) and 99 of the other (say<br>
0). Since it is a pure noise, classifier might choose just say that it<br>
is the target with majority samples (regardless of the actual data which<br>
is noise) since that would simply minimize its objective function during<br>
training. As a result you would get this "anti-learning" effect. That<br>
is why we usually suggest to assure to have equal number of samples per<br>
each category in training set (or chain with Balancer generator if data<br>
is disbalanced).<br>
<div class="im"><br>
On Fri, 20 Apr 2012, Ping-Hui Chiu wrote:<br>
<br>
> Dear PyMVPA experts,<br>
> Isn't a leave-one-out cross-validation supposed to produce a smaller bias<br>
> yet a larger variance in comparison to N-fold cross-validations when N<#<br>
> of samples?<br>
<br>
> I ran a sanity check on binary classification of 200 random samples.<br>
> 4-fold cross-validations produced unbiased estimates (~50% correct),<br>
> whereas leave-one-out cross-validations consistently produced<br>
> below-than-chance classification performances (~40% correct). Why?<br>
<br>
> Any insight on this will be highly appreciated!<br>
<br>
> My code is listed below:<br>
<br>
> from mvpa2.suite import *<br>
> clf = LinearCSVMC();<br>
> cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))<br>
> cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))<br>
> acc_chunks=[]<br>
> acc_events=[]<br>
> for i in range(200):<br>
> �print i<br>
> �ds=Dataset(np.random.rand(200))<br>
</div>> �[1]<a href="http://ds.sa/" target="_blank">ds.sa</a>['targets']=np.remainder(range(200),2)<br>
> �[2]<a href="http://ds.sa/" target="_blank">ds.sa</a>['events']=range(200)<br>
> �[3]<a href="http://ds.sa/" target="_blank">ds.sa</a>['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))<br>
<div class="im">> �ds_chunks=cv_chunks(ds)<br>
> �acc_chunks.append(1-np.mean(ds_chunks))<br>
> �ds_events=cv_events(ds)<br>
> �acc_events.append(1-np.mean(ds_events))<br>
<br>
> >>>print np.mean(acc_chunks), np.std(acc_chunks)<br>
> 0.50025 0.0442542370853<br>
> >>>print np.mean(acc_events), np.std(acc_events)<br>
> 0.40674 0.189247516232<br>
<br>
> Thanks!<br>
> Dale<br>
<br>
</div>> References<br>
<br>
> Visible links<br>
> 1. <a href="http://ds.sa/" target="_blank">http://ds.sa/</a><br>
> 2. <a href="http://ds.sa/" target="_blank">http://ds.sa/</a><br>
> 3. <a href="http://ds.sa/" target="_blank">http://ds.sa/</a><br>
<br>
> _______________________________________________<br>
> Pkg-ExpPsy-PyMVPA mailing list<br>
> <a href="mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org">Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org</a><br>
> <a href="http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa" target="_blank">http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa</a><br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
--<br>
=------------------------------------------------------------------=<br>
Keep in touch <a href="http://www.onerussian.com/" target="_blank">www.onerussian.com</a><br>
Yaroslav Halchenko <a href="http://www.ohloh.net/accounts/yarikoptic" target="_blank">www.ohloh.net/accounts/yarikoptic</a><br>
<br>
_______________________________________________<br>
Pkg-ExpPsy-PyMVPA mailing list<br>
<a href="mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org">Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org</a><br>
<a href="http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa" target="_blank">http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa</a></font></span></blockquote></div><br>