[pymvpa] On below-chance classification (Anti-learning, encore)

Thu Jan 31 09:35:31 UTC 2013

I believe below-chance accuracy is a natural phenomenon in classification theorem. This issue is obvious when one finds the permutation distribution of some problem, where in a two-category problem, the distribution has a peek around 50% accuracy, so there will always be some (or, a lot of) below-chance values. This case is more likely to happen when the dataset has few samples, and probably high dimensional data. I am not sure if any procedure to relabel the data, or any other fine-tuned algorithm would be considered as 'tweaking the results'. 

>My question is this: How much (statistical?) merit would it be to come with some sort of index to show how much a given classification accuracy is off from absolute chance for this classification?

A p-value via permutation testing is the better candidate to answer this question, eg, p<0.01.

Regards,
-Rawi

>________________________________
> From: Jacob Itzhacki <jitzhacki at gmail.com>
>To: pkg-exppsy-pymvpa at lists.alioth.debian.org 
>Sent: Thursday, January 31, 2013 9:20 AM
>Subject: [pymvpa] On below-chance classification (Anti-learning, encore)
> 
>
>Dear all,
>
>
>First off, pardon me if anything of what I say might already be described somewhere else, I've done quite a bit of searching and reading on the subject (eg. including Dr. Kowalczyks lecture) but it is always possible to have bypassed something in this internet age. After reading as much as I could about the problem I've noticed that the workarounds proposed don't really fix the problem, which I am facing quite a bit, to the point that around 1/3 of classifications are below classification accuracy (38-42% for 2way or 17%-19% for 4-way). I would like to have some feedback on an idea I've had to try to still have this data be useful.
>
>
>My question is this: How much (statistical?) merit would it be to come with some sort of index to show how much a given classification accuracy is off from absolute chance for this classification?
>
>
>Elaborating, it would be displaying the absolute value of the substraction of the resulting accuracy from chance level. Say, for a 2-way classification (with 50% chance level), in which you obtain accuracies of 38% and 62% in 2 different instances the difference from chance for both would be 12% which would make them equivalent.
>
>
>Please offer as much criticism as you can to this approach.
>
>
>Thanks in advance,
>
>
>Jacob
>
>
>
>
>PS. For completions sake, I'll first list the things I've tried.
>
>
>I'm running the classification on fMRI data obtained from a paradigm that gives the following classification opportunities:
>
>
>a. 4 categories, with 40 trials each at its fullest use (160 trials)
>b. 2 categories as one yielding a classification of 80 trials for each, by including two categories as one.
>c. 2 categories, with 40 trials each, by disregarding 2 of the conditions.
>
>
>I am also using a total of 8 different ROI.
>
>
>I have tried reordering the trials on one of the subjects, however this results in above chance accuracies in one analysis and below in the other for the same ROI which gets rather frustrating if I wanted to do some sort of averaging by the end. However, there seems to be some consistency into which classification moves away from chance which leads me once again to believe that there is in fact some learning even in the below-chance classifications but the seeming anti-learning baffles me. What does it mean?! (And how is it even possible? O.o)
>
>
>Thanks again.
>_______________________________________________
>Pkg-ExpPsy-PyMVPA mailing list
>Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
>