[pymvpa] feature sensitivity in MVPA run on EEG data

Sat Mar 8 14:50:13 UTC 2014

Hi Marius,

I can't remember what the documentation says for PLR, but I believe you
are right that this implementation only does two-way classification. The
nearest off-the-shelf multinomial equivalent I can think of is SMLR.
Worth trying, though is uses sparse (L1-style) regularisation, meaning
it will try to set most of the betas to zero, and won't give you the
smooth sensitivity maps of an L2 regularization. Scikit Learn has a heap
of other classifiers, including a multiclass capable regression in which
you can specify either L1 or L2 regularisation:
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
... and which you can wrap into PyMVPA:
http://dev.pymvpa.org/examples/skl_classifier_demo.html

Regards the idea of noise, think of this *scenario* - you are detecting
something in auditory areas, so temporal electrodes should contain the
interesting information (say T9 is the most informative). However, the
subject also blinks and makes eye movements. These artefact signals are
strongest at frontal electrodes (say Fpz), but is also present at a
lower amplitude in temporal electrodes. So a smart regularised
regression might give a big weight to T9, and *subtract* a downscaled
version of Fpz, to listen to what T9 would look like if the
eye-movements weren't there.

Make sense?

best,

Brian

On Fri, 2014-03-07 at 23:04 +0000, Marius 't Hart wrote:
> Hi Brian,
> 
> I've just tried PLR. I can get it to work with 2 categories but not with 
> 4. Is that some error on my part or is it correct that PLR only works 
> with 2 categories? I'd like to have a classifier that can also handle 3 
> or 4 categories.
> 
> Thanks!
> Marius
> 
> On 14-01-28 07:33 AM, Brian Murphy wrote:
> > Hi,
> >
> > just jumping into this discussion a bit late...
> >
> >> Tying in to another discussion, could it be beneficial to first average
> >> every 5 trials or so? In a way this reduces noise, so the performance
> >> would most likely go up - as might the informativeness of feature
> >> sensitivity. The downside is that you no longer have predictions on a
> >> trial by trial basis.
> > If you have enough data to get away with it (ie will still have enough
> > cases to train on), then yes, it is worth trying, with a very important
> > caveat: that you are interested in time-domain signals. Obviously a
> > straight trialwise averaging will wash out any interesting spectral
> > activity which isn't phase-locked (and given your task, precise
> > phase-locking seems unlikely). But anyway, averaging might clean up the
> > sensitivity maps. Then again, from the paper-writing point of view,
> > keeping things as simple as possible is always preferable.
> >
> >>>>> Also: what preprocessing did you do? Any z-scoring, baseline correction etc?
> >>>> I do baseline correction, but no Z-scoring. Should I do Z-scoring? If so, over all data, within electrode or within trial?
> > I'm not an SVM expert, so this might not be relevant - but for many
> > classifiers, the weights are only interpretable as sensitivity measures
> > if the underlying variable is on a similar scale. So, for the sake of
> > argument, if your Cz was twice as loud as your Pz (unlikely, I know),
> > then it's weights would be scaled down, and not be directly comparable.
> > So yes, for sensitivity analyses z-scoring of some kind would be
> > advisable - there are several ways, e.g. ideally you would do this based
> > on the *clean* rest periods (you've done manual artefact rejection - so
> > that should be possible). But for EEG data you can often just z-score
> > based on the whole signal time-course. [I see Nick O has made similar
> > suggestions]
> >
> >
> >> That doesn't look like what I expected - but I find it hard to judge if
> >> what I'm doing is actually correct.
> > There are few reasons that could account for the differences you see between the ERPs and the sensitivity maps:
> >   - different scaling of the input signals (as above)
> >   - more/less variance in the signals (looking at the ERPs, it looks like particular periods have better or worse separation between the conditions, but it is not just the magnitude of this difference that matters, but rather its magnitude *relative to the variance* across trials)
> >   - models may also give weights to features that are good descriptions of noise, so that noise can be factored out of other condition-informative features. See this paper for details, also on how to normalise the sensitivity maps to compensate for this effect:
> > http://www.citeulike.org/user/emanueleolivetti/article/12177881
> >
> > Regards classifiers, LinSVM is good, but my preference would be a regularized logistic regression (e.g. PLR), as I've yet to find a situation in which any variety of SVM gives me a decisive performance advantage. Also, consider the idea of SVMs, which are to find a hyperplane that best separates the boundary cases. If these boundary cases are representative of the conditions in general that is just fine. But if they are outliers in some sense, then maybe not.
> >
> > Brian
> >
> 
> 
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

-- 
Dr. Brian Murphy
Lecturer (Assistant Professor)
Knowledge & Data Engineering (EEECS)
Queen's University Belfast
brian.murphy at qub.ac.uk