[pymvpa] effect size (in lieu of zscore)
J.A. Etzel
jetzel at artsci.wustl.edu
Tue Jan 3 20:43:32 UTC 2012
Ah, rather different problem than I'd thought. Below-chance accuracies
are a big problem with fMRI data ... sometimes they happen when data is
poorly fitted (e.g. improper scaling), sometimes with mistakes (e.g.
mislabeled cases, unbalanced training data), sometimes for no apparent
reason. Interpreting these is an open issue in MVPA, I'd say.
Is this your first MVPA with this data? Searchlight analysis can be good
for identifying very localized information but cause big troubles in
other cases (I have a paper coming out in the MLINI proceedings about
some of these issues and can send you a draft version if you'd like), is
very sensitive to processing choices (normalization, etc), and can be
very unwieldy to troubleshoot.
I think you said this is sound stimuli? Perhaps you could set up a
ROI-based analyses with positive and negative controls (e.g. auditory
cortex if that should classify for sure and something like primary motor
if that should not classify). Running a ROI-based analysis is a lot
quicker than a searchlight, and should let you test the impact of things
like averaging trials together. Once that's making sensible results you
can go back to a searchlight analysis if that's what you really need.
As a side note, I've found averaging trials together to be quite useful
sometimes, particularly for between-subjects analyses. But you need to
be very careful when averaging only several trials from a run - trials
from the same run will pretty much always be more similar to each other
than trials averaged from other runs, so can bias the results (e.g. if
one of the averaged-trials from run #2 is in the training and another is
in the testing it might classify better than when all averaged-trials
from run #2 are in the testing). I would either average to one per run
or create partitions so that these sorts of splits can't happen (e.g.
leave-one-run-out partitioning).
Jo
On 1/3/2012 2:07 PM, Mike E. Klein wrote:
> Hi Jonas and Jo,
>
> Thanks for helping out with this!
>
> So:
>
> (1) I haven't done a permutation test. By "chance distribution" I just
> meant the bulk of the data points using my real-label-coded data. While
> I'm obviously hoping for a histogram that contains a positive skew, /at
> worst/ I'd expect a normal distribution centered around chance. Once I
> get this error figured out, I will do some permutation testing as well,
> but at the moment it doesn't seem necessary. (In other words, with real
> data or fake data, I can't see why I'd ever see a /negative/ skew unless
> I'm doing something else wrong.)
>
> (2) I've generally been doing 3-to-1 averaging of my trials because of
> time/processing limitations; because python seems to choke on a
> searchlight analysis using LinearCSVMC if I don't first perform
> run-averaging; and because I'm concerned about the cleanliness of my
> data (sparse-sampling images). I'm re-running one of these analyses
> using LinearNuSVMC and without any run-averaging, but my hunch is that
> this isn't the problem. PyMVPA, after run-averaging, is showing 2 labels
> with 27 examples each, which I what I was expecting.
>
> (3) This seems to be an issue with multiple subjects...it might in fact
> be a universal problem. I saw it sporadically before, but then it seems
> my preprocessing was incorrect (I was zscoring with only 11 samples in
> mini-runs, instead of the 30+ I had intended).
>
> (4) My distribution is centered on 0 and goes from -50 to +50 because,
> after the searchlight analysis I'm using:
>
> #set up searchlight
> sl = sphere_searchlight(cv, radius=3, space='voxel_indices',
> center_ids=center_ids, postproc=mean_sample())
> #run searchlight on ds
> s1_map = sl(dataset)
> #convert into percent-wise accuracies centered around zero
> s1_map.samples *= -1
> s1_map.samples += 1
> s1_map.samples *= 100
> s1_map.samples -= 50
>
> -Mike
More information about the Pkg-ExpPsy-PyMVPA
mailing list