<div dir="ltr">Hi,<br><br>I see. The error rate was the best with the full set of features, so no features were selected. However, then I don&#39;t understand how I achieve the selection of fixed number of features using RFE. More specifically:<br>

<br>1. I would like to get 30 features, based on which I get the best prediction. I don&#39;t care that with 31 features (or 3022) I will get a better prediction. Isn&#39;t your graph Fig.1 in &quot;Full Brain Classiï¬cation: There Is No â€œFaceâ€ Identiï¬cation Area&quot; paper was the result of such analysis? In addition, attached the output which I get from classification of another dataset,

which resulted in desired 30 features. However, I am confused to

understand what I see there. If I read out it correctly, starting from

step 1, one subset of 30 features was selected and classified all the

time. What about the other possible subsets? How RFE knows that it is

the best one? It just picked the best ranks from original 3022? I am

not sure that it is very optimal. If you have some working example of correct / optimal RFE usage, I would very appreciate you sending me.<br><br>2. Unfortunately, even after reading Guyon 2002, I feel that I don&#39;t fully understand RFE algorithm. Particularly,  what is the size of the original features subset, that algorithm starts with? Does it really start with full features set, although for 1000 voxels it is an evident overfitting? The solution with 3022 voxels, which I got, is not going to generalize well (given that I have 480 trials only), what is a benefit from such a solution?Â  Any reference, which will clarify me all those issues are more than welcomed.<br>

<br>Thank you for your assistance.<br>I really consider using PyMVPA, because I was impressedÂ  by robustness of this software. However, although your doc is well written and organized, I am still got stuck in some places.<br>

<br>Vadim<br><br><br><div class="gmail_quote">2009/4/26 Yaroslav Halchenko <span dir="ltr">&lt;<a href="mailto:debian@onerussian.com">debian@onerussian.com</a>&gt;</span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

actually I should have discovered the problem before asking you to<br>

upload the data...<br>

<br>

in your code you use<br>

N_FEATURES = 30<br>

...<br>

<div class="im"> Â  Â  Â feature_selector=FixedNElementTailSelector(N_FEATURES,<br>

 Â  Â  Â  Â  Â tail=&#39;upper&#39;, mode=&#39;select&#39;),<br>

<br>

<br>

</div>so you aren&#39;t doing RFE per se ;) you just select 30 features right<br>

on first step of RFE.... then, those 30 features lead to higher<br>

generalization error than if you took all of them, therefore initial<br>

dataset with all features is taken as the result.<br>

<br>

to see that you had just to enable RFE debug target (or all RFE ones)<br>

with<br>

<br>

debug.active += [&#39;RFE.*&#39;]<br>

<br>

to see what is happening:<br>

<br>

In [12]:## working on region in file /tmp/python-8102meB.py...<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 0: nfeatures=3022<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 0: nfeatures=3022 error=0.2125 best/stop=1/0<br>

[RFEC_] DBG: Â  Â  Â  Â  Â  Sensitivity: [-0.00507313 Â 0.00025722 Â 0.00159871 ..., -0.00212875 Â 0.00078268<br>

Â -0.00027174], nfeatures_selected=30, selected_ids: [ 120 Â 338 Â 341 Â 356 Â 462 Â 472 Â 483 Â 501 Â 517 Â 571 Â 573 Â 574 Â 594 Â 612 Â 619<br>

 Â 634 Â 635 Â 636 Â 659 Â 676 Â 677 Â 760 Â 778 Â 779 Â 796 Â 872 1109 1338 1545 1677]<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 1: nfeatures=30<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 1: nfeatures=30 error=0.2500 best/stop=0/0<br>

[RFEC_] DBG: Â  Â  Â  Â  Â  Sensitivity: [ 0.09779742 Â 0.16359045 Â 0.02775154 Â 0.09486282 -0.0804099 Â -0.04392221<br>

Â -0.06721182 Â 0.09752928 Â 0.03872871 Â 0.08811431 Â 0.14541801 Â 0.13167303<br>

 Â 0.13925132 Â 0.03046704 Â 0.04748648 Â 0.09525846 -0.04226041 Â 0.06917038<br>

 Â 0.03207438 Â 0.06333298 Â 0.01423283 Â 0.02703152 Â 0.16574083 Â 0.05634531<br>

 Â 0.11383484 Â 0.03402658 Â 0.07105218 -0.02116503 Â 0.24369252 Â 0.20591227], nfeatures_selected=30, selected_ids: [ 0 Â 1 Â 2 Â 3 Â 4 Â 5 Â 6 Â 7 Â 8 Â 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24<br>

Â 25 26 27 28 29]<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 2: nfeatures=30<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 2: nfeatures=30 error=0.2500 best/stop=0/0<br>

[RFEC_] DBG: Â  Â  Â  Â  Â  Sensitivity: [ 0.09779742 Â 0.16359045 Â 0.02775154 Â 0.09486282 -0.0804099 Â -0.04392221<br>

Â -0.06721182 Â 0.09752928 Â 0.03872871 Â 0.08811431 Â 0.14541801 Â 0.13167303<br>

 Â 0.13925132 Â 0.03046704 Â 0.04748648 Â 0.09525846 -0.04226041 Â 0.06917038<br>

 Â 0.03207438 Â 0.06333298 Â 0.01423283 Â 0.02703152 Â 0.16574083 Â 0.05634531<br>

 Â 0.11383484 Â 0.03402658 Â 0.07105218 -0.02116503 Â 0.24369252 Â 0.20591227], nfeatures_selected=30, selected_ids: [ 0 Â 1 Â 2 Â 3 Â 4 Â 5 Â 6 Â 7 Â 8 Â 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24<br>

Â 25 26 27 28 29]<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 3: nfeatures=30<br>

[RFEC ] DBG: Â  Â  Â  Â  Â  Step 3: nfeatures=30 error=0.2500 best/stop=0/0<br>

[RFEC_] DBG: Â  Â  Â  Â  Â  Sensitivity: [ 0.09779742 Â 0.16359045 Â 0.02775154 Â 0.09486282 -0.0804099 Â -0.04392221<br>

Â -0.06721182 Â 0.09752928 Â 0.03872871 Â 0.08811431 Â 0.14541801 Â 0.13167303<br>

 Â 0.13925132 Â 0.03046704 Â 0.04748648 Â 0.09525846 -0.04226041 Â 0.06917038<br>

 Â 0.03207438 Â 0.06333298 Â 0.01423283 Â 0.02703152 Â 0.16574083 Â 0.05634531<br>

 Â 0.11383484 Â 0.03402658 Â 0.07105218 -0.02116503 Â 0.24369252 Â 0.20591227], nfeatures_selected=30, selected_ids: [ 0 Â 1 Â 2 Â 3 Â 4 Â 5 Â 6 Â 7 Â 8 Â 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24<br>

Â 25 26 27 28 29]<br>

<br>

....<br>

<br>

see original RFE definition on how to actually do RFE ;) or just try SMLR<br>

which might be more efficient, who knows ;)<br>

<div><div></div><div class="h5"><br>

<br>

On Sat, 25 Apr 2009, Yaroslav Halchenko wrote:<br>

<br>

&gt; at first I thought that I know what is the reason, but then I realized<br>

&gt; that it shouldn&#39;t be... didn&#39;t test though. to expedite things would you<br>

&gt; mind uploading your data + code to the address I will provide you in a<br>

&gt; followup email? ;)<br>

<br>

&gt; On Sat, 25 Apr 2009, Vadim Axel wrote:<br>

<br>

&gt; &gt; Â  Â Hi,<br>

&gt; &gt; Â  Â I implemented some simple RFE logic, similar to what was described<br>

&gt; &gt; Â  Â here: [1]<a href="http://www.pymvpa.org/featsel.html" target="_blank">http://www.pymvpa.org/featsel.html</a><br>

&gt; &gt; Â  Â At the end of the classification procedure, I verify the the features<br>

&gt; &gt; Â  Â that were selected based on what was described here:<br>

&gt; &gt; Â  Â [2]<a href="http://www.pymvpa.org/faq.html#how-do-i-know-which-features-were-fin" target="_blank">http://www.pymvpa.org/faq.html#how-do-i-know-which-features-were-fin</a><br>

&gt; &gt; Â  Â ally-selected-by-a-classifier-doing-feature-selection<br>

&gt; &gt; Â  Â Now the problem: sometimes the resulted number of selected features is<br>

&gt; &gt; Â  Â the exact number, which is required (I use FixedNElementTailSelector),<br>

&gt; &gt; Â  Â whereas in some other case, for completely unknown reason, I get full<br>

&gt; &gt; Â  Â set of features. The issue is really weired, since for two sessions of<br>

&gt; &gt; Â  Â a subject I get selected feature set, but for two other sessions of the<br>

&gt; &gt; Â  Â same subject I get full feature set. I suspect, that the problem might<br>

&gt; &gt; Â  Â be in updating the feature_ids variable and not with classification,<br>

&gt; &gt; Â  Â because the classification error rate was pretty low.<br>

&gt; &gt; Â  Â Attached my code. Is it any problem with it?<br>

&gt; &gt; Â  Â I can also upload my dataset (~50 Mb zip). I didn&#39;t succeed to<br>

&gt; &gt; Â  Â reproduce it with smaller amount of data.<br>

&gt; &gt; Â  Â Thanks for your help,<br>

&gt; &gt; Â  Â Vadim<br>

<br>

&gt; &gt; Ð¡ÑÑ‹Ð»ÐºÐ¸<br>

<br>

&gt; &gt; Â  Â 1. <a href="http://www.pymvpa.org/featsel.html" target="_blank">http://www.pymvpa.org/featsel.html</a><br>

&gt; &gt; Â  Â 2. <a href="http://www.pymvpa.org/faq.html#how-do-i-know-which-features-were-finally-selected-by-a-classifier-doing-feature-selection" target="_blank">http://www.pymvpa.org/faq.html#how-do-i-know-which-features-were-finally-selected-by-a-classifier-doing-feature-selection</a><br>


--<br>

Yaroslav Halchenko<br>

Research Assistant, Psychology Department, Rutgers-Newark<br>

Student Â Ph.D. @ CS Dept. NJIT<br>

Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171<br>

 Â  Â  Â  Â 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102<br>

WWW: Â  Â  <a href="http://www.linkedin.com/in/yarik" target="_blank">http://www.linkedin.com/in/yarik</a><br>

<br>

_______________________________________________<br>

Pkg-ExpPsy-PyMVPA mailing list<br>

<a href="mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org">Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org</a><br>

<a href="http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa" target="_blank">http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa</a><br>

</div></div></blockquote></div><br></div>