[pymvpa] feature sensitivity in MVPA run on EEG data

Tue Mar 11 22:27:57 UTC 2014

Hi Jo,

Thanks for the interest and good questions!

I'm using channels as features here, and I use only two. You might see 
in the mailing-list history that Cz and Pz show a nice CNV effect. But 
in Cz it is mainly initial CNV and in Pz it is mainly terminal CNV. I 
would expect that a classifier would be more sensitive to Cz at the 
start, and then shift to Pz later on. All this is not revolutionary in a 
scientific sense, so I thought it would be a nice toy-problem for me to 
try and learn MVPA. I apply the z-scoring to Cz and Pz signals before 
averaging the trials.

I have to add that I divide the interval into sub-intervals. Within 
those, I average the signal of each electrode, so that the features have 
a single value. This makes it easier to run the classifier for me, but 
should also remove lots of noise.

There might be an interaction with time, but that should be present in 
any data, especially the case where there is no grouping of trials (or 
grouping them in sets of size=1, if you will). That's why I think 
averaging trials this way is most similar to having trial-based data.

Yes, there are blocks, imposing breaks. I tried to talk to the 
participant in every break, to remind them of a change in the 
instructions and to force them to take their mind of the task for a bit, 
as well as to offer an actual break, which they were free to take, but 
usually didn't. The trials were mixed up more or less randomly, so that 
I think that within each block of 120 trials there should be around 15 
trials in each of the 8 conditions. Given the rate of rejected trials 
that would result in a sets of trials for averaging of probably at least 
10 up to 15, which is still higher than what seems to be optimal. I will 
give it a shot though.

I do a cross-validated N-fold partitioner using linear SVM (and I've 
tried PLR and KNN as well). Linear SVM seems to be pretty standard and 
works better than KNN for some reason, so it seems like a safe choice. 
But I have no experience with MVPA, so I might be totally wrong there.

The crossvalidator is provided with this function:
errorfx=lambda p, t: np.mean(p == t)
Which should ensure that it returns performance as hit-rate, not as 
error-rate, right? Otherwise a score of 0 would actually imply perfect 
performance...

Thanks,
Marius

On 14-03-11 04:48 PM, J.A. Etzel wrote:
> Hmm; this is worryingly sensitive, especially since you have so many 
> examples. How many dimensions do you have (I guess channels here, not 
> voxels)? Did you average the z-scored data or "raw" data?
>
> I wonder if there is some sort of interaction with time (what order 
> the trials were completed in); I know the effects of movement, 
> fatigue, etc. are very different with EEG than fMRI, but assume that 
> there is some effect. It might help understand the averaging effect if 
> you mix up which trials are averaged together (e.g. pick five at 
> random instead of consecutive trials). Are there some natural breaks 
> in the data (e.g. rest periods or different trial conditions)? If so, 
> perhaps averaging within the epochs might produce more sensible results.
>
> Also, what cross-validation scheme are you using? How do you adjust it 
> for the number of examples?
>
> good luck,
> Jo
>
>
> On 3/7/2014 1:42 PM, Marius 't Hart wrote:
>> Then I also tried averaging across several trials, by taking N
>> consecutive trials within the same condition, and discarding the
>> remaining trials. The minimum number of trials that were acceptable for
>> analysis within the conditions and across subjects was 70 (one subject
>> had 104, average close to 90). When I average across 23 trials (so that
>> there are a minimum of 3 targets within each condition) the noisiness of
>> the data should be minimal, but the Linear SVM performs at 0% for many
>> participants across the whole preparation interval and on average at
>> slightly above 20%... well below chance! Something must be terribly
>> wrong there. When averaging around 5 trials (so that there are 35
>> targets or more in each condition) performance looks better. Using sets
>> of 10, 15 and 20 trials progressively decreases performance. So it seems
>> that somewhere around 5 there is an optimum. I'm not sure how to pick a
>> good value here, without trying them all and picking the one that
>> performs best.
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa