[pymvpa] Question about sensitivity analysis and SVM in tutorial part 6

Wed Feb 29 09:38:57 UTC 2012

Dear all:

   I had question about the rational in this page
http://pymvpa.org/tutorial_sensitivity.html#chap-tutorial-sensitivity

1. feature selection on ANOVA

------------------------------------------------------------------------------------------------------
>>> fsel.train(bin_demo)
>>> bin_demo_p = fsel(bin_demo)
>>> results = cvte(bin_demo_p)
>>> print cvte.ca.stats.stats["ACC%"]
100.0
Wow, that is a jump. Perfect classification performance, even though
the same categories couldn’t be distinguished by the same classifier,
when trained on all eight categories. I guess, it is obvious that our
way of selecting features is somewhat fishy – if not illegal. The
ANOVA measure uses the full dataset to compute the F-scores, hence it
determines which features show category differences in the whole
dataset, including our supposed-to-be independent testing data. Once
we have found these differences, we are trying to rediscover them with
a classifier. Being able to do that is not surprising, and precisely
constitutes the double-dipping procedure. As a result, obtained
prediction accuracy and the created model potentially are completely
meaningless.
------------------------------------------------------------------------------------------------------

If I understand correctly, the 100% classification accuracy mentioned
above is only the training set result, not the accuracy about the
cross-validation, which I suppose to be stored in the results.samples.
 Therefore, I don't understand why the author blame the inclusion of
the to-be test part of the data.

In my naive opinion, I do agree with the conclusion that using ANOVA
to select features (based on their categorical difference) to later
decode the category commits the double-dipping fallacy. However, that
occurs independent of the training-testing splits. Doing feature
selection based on ANOVA is double dipping, if what you ask the
classifier to do later is sort of reconfirming the ANOVA process, even
if the selection is based on training data only.

2. binary classification inside SVM: one versus another, or one-versus-the-rest?

-----------------------------------------------------------------------------------------------------
>>> # alt: `sens = load_tutorial_results('res_haxby2001_sens_5pANOVA')`
>>> sens = sensana(ds)
>>> type(sens)
<class 'mvpa2.datasets.base.Dataset'>
>>> print sens.shape
(28, 39912)
Why do we get 28 sensitivity maps from the classifier? The support
vector machine is constructs a model for binary classification
problems. To be able to deal with this 8-category dataset, the data is
internally split into all possible binary problems (there are exactly
28 of them). The sensitivities are extracted for all these partial
problems.
------------------------------------------------------------------------------------------------------

the number 28 coming from the 2-out-8(category) combination = 8!/(8-2)!

The calculation is precise. However, that puzzles me with the general
question: how SVM predict target class?

For example, in this example, we have 8 targets.

solution 1. get a vote on each of the 28 possible binary decision:
{face/shoes, face/house, .....bottle/shoes}, and sorted out the most
likely one?

solution 2. get a vote on each of the 8 possible binary decision:
{face/non-face, cat/non-cat) and pick out the one with maxim (most
dominance win in the vote?)

Thanks!

Best,
Xiaokun