[pymvpa] SMLR weights

Sat Jan 24 13:56:38 UTC 2009

> >no, not at all ;) for that case I was just curious to have a look at the values from training data themselves.
> But your asking actually lead me to a question about cross-validation output esp. for leave-one-out method. 
> I noticed the harvest_attribs option in CrossValidatedTransferError, and was wondering whether the sensitivities or 
> other measures harvested there should always be used in place of those from full dataset without cross-validation. 
depends on your assumptions, goals, and what kind of sensitivity is at
hands ;-)

> For sensitivity particularly, CrossValidatedTransferError gives a set of sens values for each run, and I'm not sure 
> how could they be summed up. In SMLR for instance, I guess there might be tiny shifts of selected voxels in each 
> leave-one-out run. 
what kinds of shifts?

in SMLR there is another tricky point -- it does feature selection, so
any kind of analysis of sensitivities across splits might need to take
that into account. look at our 2nd paper though: 
http://frontiersin.org/neuroinformatics/paper/10.3389/neuro.11/003.2009/

taking sensitivities across splits allow to judge on the significance of
the values if you allow youself to consider them as independent samples
of sensitivities drawn from some distribution... so you are obtaining
error-margin on their mean across the splits

> Would a mean across runs still be a valid sensitivity? Any suggestions on that?
anything you do is valid, if you state you prior assumptions ;-)

> I found a NFoldSplitter() can be added in SMLRWeights(SMLR() ) and give 
> a sensitivity vector.
not sure where what is added... just cut-paste source snippet

> This is however apparently not the mean of the harvest_attribs one, 
> as the number of selected voxels are much smaller for the former. So it seems the voxel 
> shift issues across CV runs is dealt with already.
not clean what 'voxel shift' we are talking about? the one which we hope
is addressed during preprocessing motion-correction stage?

but iirc in your case you work on anatomicals, ie there is not
time-sequence but different subjects, which are resliced into some
common space; ofcause there is variability but...

> Is this the one that should be preferred 
> to the one from full dataset (without cross-validation)?
sensitivity mean might be more stable and less noisy imho, so depending
on what your goals are once again

> Is there a general form of
> clf().getSensitivityAnalyzer() with a NFoldSplitter() option?
kinda... iirc it is
SplitFeaturewiseDatasetMeasure
just look at its constructor help -- it is pretty much as simple as
SplitFeaturewiseDatasetMeasure(splitter, measure)

> I know I need to go to documentation / source code and read more carefully. I guess
> for now a simple hint about what you would choose /chose on this for a paper would be 
> helpful enough. Thanks!
you are welcome!

yeah -- documentation reading session would help, but I would advise to
get through both our papers (they are shortish) and them glance over the
code in supplementals materials of the 2nd paper -- I bet you would feel
more comfortable with pymvpa after that.

> >Thanks again for your response!

> >Best, Frank

> > Yeah the dot_prod is not that important. I just tried to get a little more idea about how it works.
> > Have a nice weekend!
> U2 ;-)

> > Best, Frank
-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik