[pymvpa] Train and test on different classes from a dataset

Thu Jan 31 11:29:49 UTC 2013

Michael,

Thank you very much for your help!!! The first step seems to give plausible
results (in the sense that the accuracy maps for training and testing on A/B,
and for training on A/B and testing on C/D look similar, but with the latter
giving somewhat lower accuracies - exactly what I would expect).

Now, I would like to adapt the permutation analysis accordingly. The first thing
I'm not sure about is
a) whether I should permute only C/D labels (such that the classifier is trained
on the real labels, but tested on permuted labels), or 
b) whether I should permute only A/B labels, or
c) both.

In the code below, I tried to set up the permutator as described in a). What
happens is that it gives unnaturally good results (in the sense that the p
values are very low even for voxels with rather low accuracies...). So, there
seems to be something wrong with how I set up the permutation analysis. Any
suggestions on where the problem lies would be very welcome!

Best,
Jan

#### relevant code snippets ####

train = ['condA', 'condB']
test = ['condC', 'condD']

labelsList = train + test

labelsDict = {"targets" : ['condC', 'condD']}

ds = fmri_dataset(samples=os.path.join(path, infile),
                targets=attr.targets,
                chunks=attr.chunks,
                mask=os.path.join(mask_path, mask_file))

# preprocessing
(...)

# Selecting labels
ds = ds[np.array([l in labelsList for l in ds.sa.targets], dtype="bool")]

partitioner = NFoldPartitioner(cvtype=1)

class MyFilter(Node):

    def __init__(self, target_groups, part_attr, target_attr,
                 space='filtered_partitions', **kwargs):
        self._target_groups = target_groups
        self._target_attr = target_attr
        self._part_attr = part_attr
        Node.__init__(self, space=space, **kwargs)

    def generate(self, ds):

        # binary mask for training and testing portion
        train_part = ds.sa[self._part_attr].value == 1
        test_part = ds.sa[self._part_attr].value == 2

        # binary mask for the first and second target group
        match_1st_group = [t in self._target_groups[0] for t in
ds.sa[self._target_attr].value]
        match_2nd_group = [t in self._target_groups[1] for t in
ds.sa[self._target_attr].value]

        # removed group1 in the training set and group2 in the testing set
        # as I'm only interested in the variant below

        # in the second to-be-returned dataset we will blank out
        # group2 in the training set and group1 in the testing set
        new_part = ds.sa[self._part_attr].value.copy()
        new_part[np.logical_and(train_part, match_2nd_group)] = 0
        new_part[np.logical_and(test_part, match_1st_group)] = 0
        ds.sa[self.get_space()] = new_part
        yield ds

chain = ChainNode([partitioner,
                    MyFilter((train, test),  
                            partitioner.get_space(),
                            'targets')
                    ])

clf = LinearCSVMC(C=1)

cv = CrossValidation(clf, partitioner, errorfx = lambda p, t: np.mean(p==t),
enable_ca=["stats"])
sl = sphere_searchlight(cv, radius = 3, postproc = mean_sample(),
space='voxel_indices',  nproc=n_cpus)

result = sl(ds)  # result of searchlight analysis

sphere_acc = result.samples[0]
map2nifti(ds, sphere_acc).to_filename(<filename>)

n_permutations = 100
res_permutation = np.zeros((n_permutations, ds.nfeatures))  # zero-filled 2-D
matrix with permutations x features
for i in range(n_permutations):  # run n permutations

    permutator = AttributePermutator(attr='targets', count=n_permutations,
assure=True, limit=labelsDict)
    ds_tmp = permutator(ds)

    # run searchlight on dataset with shuffled class labels
    result_p = sl(ds_tmp)

    # add to permutation result array
    res_permutation[i,:] = result_p.samples[0]

# permutation stats
res_permutation -= result.samples[0]
res_permutation[res_permutation>=0] = 1.
res_permutation[res_permutation!=1] = 0.
inv_p_values = 1 - ((np.sum(res_permutation, axis=0) + 1.) / (n_permutations +
1.))

# save p-value map
map2nifti(ds, p_values).to_filename(<filename>)