[pymvpa] z-score before or after removal of junk data

Mon Dec 2 21:55:44 UTC 2013

On Mon, 02 Dec 2013, mhampel at uni-osnabrueck.de wrote:

> Dear all,

> I wondered if there is a good reason to keep not needed data for
> z-scoring? In the searchlight and sensitivity measures examples the order
> is like this:

> 1. zscore(dataset, dtype='float32', chunks_attr='chunks')
> 2. dataset = datatset[dataset.sa.targets != '0']

> But if I reverse the order, my accuracy values are much higher (10-20%)
> when classifying on whole brain. For searchlights the accuracies are also
> higher. On a searchlight group level I therefore need a much higher
> threshold for the p-values, with a higher threshold I get similar results.

> Is it okay to do the z-score after removing the junk data?

it should be perfectly fine and probably advisable if your "junk data"
is indeed "junk data".  

In some cases those "targets == '0'" are meaningful samples (e.g.
"control condition") which of no direct important for MVPA but could
provide you with better sense of variability among voxels etc...
moreover someone might be interested to standardize  !=0 samples against
baseline's offset/variance, thus specifying that condition to zscore's
param_est.  That is why 'in general' we kept zscoring before selecting
only relevant categories.

hope this helps ;)
-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate,     Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik