[pymvpa] RFE and dataset splits

Yaroslav Halchenko debian at onerussian.com
Wed Jul 6 20:41:12 UTC 2011


neh -- that was a good call from me but still not it -- just read it
nevertheless ;)

On Wed, 06 Jul 2011, Yaroslav Halchenko wrote:

> Hi Kimberly,

> sorry for the delay -- we are finally back from HBM and getting back
> into the routine pace...   I am about to start the RFE on your dataset
> but I immediately spotted that data wasn't normed to become
> SVM-friendly:

>     In [8]: print ds.summary()
>     Dataset: 48x135168 at float64, <sa: chunks,targets,time_coords,time_indices>, <fa: voxel_indices>, <a: imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>     stats: mean=0.00193459 std=1.16806 var=1.36437 min=-139.459 max=116.782


> So, please do (in your script to may be see if everything is ok
> yourself):

> 1.
>     zscore(ds, chunks_attr=None)

> which would do standartization through all samples (it is meaningless to
> do it per each chunk in your case since you have only 2 samples per each
> chunk), which would lead you to

>     stats: mean=9.39588e-16 std=0.574634 var=0.330204 min=-6.70418 max=6.77941

> 2.  I guess you didn't really mask (excluded non-brain voxels) the
> volume, thus have lots of invariant features  (e.g. having 0s through
> all samples):

> so you could get rid of them:

>     [25]: ds = remove_invariant_features(ds)
>     In [29]: print ds.summary()
>     Dataset: 48x44633 at float64, <sa: chunks,targets,time_coords,time_indices>, <fa: voxel_indices>, <a: imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
>     stats: mean=2.84548e-15 std=1 var=1 min=-6.70418 max=6.77941

> which should just help it to get through faster ;-) 

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic



More information about the Pkg-ExpPsy-PyMVPA mailing list