[pymvpa] Justification for trial averaging?

Thu Jan 23 17:52:38 UTC 2014

Hi,

I'm not sure about the motivation for averaging in that particular paper
- if I had to guess, it might be that they chose a simple exposition to
present what at that time was a completely novel approach. 

But averaging can work as a simple but effective method to improve the
signal/noise ratio in individual test or training cases. The
corresponding trade-off is that you have fewer cases to train from, and
would need a higher rate of success to pass any significance threshold
you might have.

A simple experiment might be to compare the classification accuracy,
significance there-of, and sensitivity maps of the following:
 - train on averages, test on averages
 - train on trials, test on averages (could get similar classification
accuracy if your classifiers are dealing well with noise, colinearity,
etc)
 - train on trials, test on trials (should get lower classification
accuracy, but be similarly significant)

Brian

On Thu, 2014-01-23 at 17:37 +0000, Shane Hoversten wrote:
> 
> I have a question about trial averaging in MVPA, by which I mean
> taking the average response of a certain stimulus class, and using
> this average value as input to the classifier, instead of feeding it
> the responses from the individual trials themselves.
> 
> For instance, in the original Haxby experiment[1] (referred to in the
> PyMVPA documentation and tutorial) each subject does two runs, and
> each run produces 12 time series, each of which includes 8 blocks, one
> for each stimulus category ('bottle' 'cat' 'chair' 'face' 'house'
> 'scissors' 'scrambledpix' ‘shoe’). I had some trouble following
> exactly what they’re collecting in each block, but the block is 24
> seconds long, so it’s a bunch of exemplars of the category in
> question.
> 
> But in the ‘mappers’ section of the tutorial[2] the data is collapsed
> into 2 runs x 8 samples per run.  So the responses for all the stimuli
> in each category (‘faces’, ‘scissors’, etc.) are averaged across the
> blocks in all 12 training sessions, producing 1 canonical sample for
> each of the categories (for each of the 2 runs.) And these ‘canonical
> samples’ are what is being used for classification purposes.
> 
> The question is, why do it this way?  The practice seems to be widely
> used, (although I can’t cite another reference off the top of my
> head.)  It seems to me that this amounts to pre-classification, where
> you’re taking a ‘typical’ face/scissors/whatever, and seeing if the
> classifier can distinguish between these different kinds of
> typicality.  But forming decisions boundaries over features is exactly
> what a classifier is meant to do, so why not just throw all these
> different exemplars into the mix, and let the classifier figure out
> its own notion of prototypicality?  And if you’re going to
> pre-classify, why pick the average response?  Why not take some kind
> of lower-dimensional input; the first several eigenvectors or
> something, or something else?
> 
> I understand that this can be empirically answered (try a bunch of
> things; do what works best) but could someone enlighten me as to the
> theoretical justification of one choice over another?
> 
> [1] http://www.sciencemag.org/content/293/5539/2425.abstract 
> [2] http://www.pymvpa.org/tutorial_mappers.html  
> 
> 
> 
> Shane

-- 
Dr. Brian Murphy
Lecturer (Assistant Professor)
Knowledge & Data Engineering (EEECS)
Queen's University Belfast
brian.murphy at qub.ac.uk