[pymvpa] Does it make sense to compare SVM weights between two different SVM classifiers?

Wed Nov 14 20:27:21 UTC 2012

On Wed, 14 Nov 2012, Meng Liang wrote:

>    Dear MVPA experts,

>    In my study, I used the fMRI signals from a given ROI to predict the
>    stimulus type for two different classification tasks: (1) type A vs. type
>    B, and (2) type C vs. type D (the two classification tasks were performed
>    on the same ROI but during different trials: the fMRI data used for task
>    'A vs. B' were taken from trials A and trials B, and the data used for
>    task 'C vs. D' were taken from trials C and trials D). It was expected
>    that this ROI should provide a higher classification accuracy in the task
>    of 'A vs. B' than in the task of 'C vs. D'. The results indeed confirmed
>    this. I just wonder whether the higher classification accuracy in the task
>    of 'A vs. B' (presumably the higher capability of the classifier in task
>    'A vs. B') relative to the task 'C vs. D' could be reflected in the
>    sensitivity maps (i.e., SVM weights) in some way? For example, would the
>    SVM of task 'A vs. B' have higher SVM weights or a larger margin compared
>    to the SVM of task 'C vs. D'? In other words, can I directly compare the
>    sensitivity maps obtained from the two different classification tasks?

Although there are studies analyzing SVM weights for diagnosticity, for
your particular comparison my exhaustive/complete answer would be:
"not sure".

I even haven't realized myself the interesting nature of how margin of a
soft-margin SVM would actually behave with increasing level of signal in
the data.  I (mistakingly) thought that higher signal would always
lead  to wider margin, but a simple test/demo revealed that I was wrong:

http://nbviewer.ipython.org/url/www.onerussian.com/tmp/CSVM_margin_demo.ipynb
P.S. feel free to download the notebook and play with it yourself

given the same data/SVM's C and only changing the signal level (here in
1st out of the 2 features). Whenever SVM still cannot learn perfectly --
margin shrinks and then starts to expand again when classes become
separable.  And that becomes sensible  now -- e.g. in case of no signal,
SVM simply cannot learn so learning error would always be large, and
there is no insensitive now to keep margin narrow, so it expands without
sacrificing any degenerate "generalization" performance.

In demo above I set C value of SVM to a very high number to get closer
to 'hard-margin' SVM scenario whenever classes become separable.  But
picture is similar with smaller C values (and e.g. C=-1 where PyMVPA
would scale it according to the mean norm of the training data),
although the switch in behavior of margin to increase again could happen
before reaching the "perfect learning" point in case of softer margins.

So, depending on how big your ROI, # of samples etc, C, SVM might
learn perfectly or not, and imho this non-homogeneity in behavior of
soft-margin SVM complicates direct analysis of the margin width among
different classification tasks.

As for separate coefficients of the hyperplane -- they are even scarier
beasts to interpret...  Feature selection (SVM+RFE or SMLR) and analysis
of "optimal number of features" might provide some interesting
food for discussion but again might be argued against.

>    I'm not sure if I asked my question clearly. Please let me know if there
>    is anything unclear.

nah -- it was clean I think -- or did I misunderstand it?

-- 
Yaroslav O. Halchenko
Postdoctoral Fellow,   Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik