[pymvpa] mean_group_sample Mapper with NFoldPartitioner

Wed Feb 22 14:08:09 UTC 2012

Hi pymvpa list,

I have a question about the mean_group_sample FxMapper:

mapper = ChainMapper([mean_group_sample(['targets','chunks']), SVDMapper()])
clf = MappedClassifier(LinearCSVMC(), mapper)
cvte = CrossValidation(clf, NFoldPartitioner(),
enable_ca=['repetition_results','stats'])

Let's say I have 2 chunks in a dataset each with 2 targets

ds.C = [1 1 1 1 2 2 2 2]
ds.T = [1 1 2 2 1 1 2 2]

The mean_group_sample(['targets','chunks']) mapper returns two chunks
with the mean targets in each:

ds.C = [1 1 2 2]
ds.T = [1 2 1 2]

That all works, until I try to use it in a ChainMapper with an
NFoldPartitioner, as shown above.

It seems that the partitioner doesn't produce the same number of targets
in the training and testing split. In my case, there are 8 chunks, 25
stimuli per chunk, divided into 5 targets (5 stimuli per target
condition). Using mean_group_sample creates the following anomaly:

ValueError: Collectable 'targets' with length [25] does not match the
required length [5] of collection '<SampleAttributesCollection>'.
 >/lib/python2.6/site-packages/mvpa/base/collections.py(558)__setitem__()
     557                                 ulength,
--> 558                                 str(self)))
     559         # tell the attribute to maintain the desired length

Is there a way to use the mean_group_sample mapper with
NFoldPartitioner() so that the testing and training splits contain the
correct length collection objects?

I run pymvpa version 0.6.0~rc2 on posix Linux 2.6.18-308.el5
(redhat/5.8/Tikanga).

I already hand-coded what I need, but I want to see if I can understand
the pymvpa framework better.

Thank you in advance for any insight into this mapper and partitioner
interaction.

Best regards,

Michael