[pymvpa] How to evaluate the goodness of classification for an unlabelled example?

Roberto Guidotti robbenson18 at gmail.com
Fri Jan 18 14:38:39 UTC 2013


Thank you for the quick response and your advices.

I've tried to figure out it from the SVM probabilities but the sigmoid is
applied to the distance from hyperplane right (?), thus if an example is
distant from a class and is more distant from the other, assume that we
have to classify using one feature x along an axis and zero is the point
that separate two classes, first class distribuited around x=1 and second
around x=-1, if have an example with x = 300, is classified as first class
but is far from first class distribution and very very far from second.
Probably I could assess this using mahalanobis distance or something else.

I've to try with SMLR which probably is good enough!

Thank you Yaroslav!
R

PS: If anyone would like to enrich the discussion!!! :))

On 18 January 2013 15:23, Yaroslav Halchenko <debian at onerussian.com> wrote:

>
> On Fri, 18 Jan 2013, Roberto Guidotti wrote:
>
> >    Dear all,
> >    I have a question that do not strictly concern to PyMVPA strictly.
> >    I trained a classifier to discriminate two classes (e.g. bananas and
> >    apples), using SVM, cross-validation etc. then I would like to try it
> with
> >    some "unlabelled" fruits, could be, bananas and apples but also melon,
> >    lemon, strawberries. If I try to classify a melon, the label assigned
> by
> >    the classifier could be banana. How can I establish a probability
> level
> >    for this fruit? I mean, if I use SVM distance from the hyperplane, the
> >    melon could be distant from bananas and further from apples
> (hyperspaces)
> >    and thus in my opinion this is not a good index for that. I would
> like to
> >    have an index that tries to tell me that is a banana only with higher
> >    probability than apples: p(bananas) = 0.3 p(apple) = 0.1 for example.
>
> What about using SMLR -- as a logistic regression its decision is based
> on the max of probabilities per each possible (trained) label.  So just
> enable_ca=['estimates'] and there (in .ca.estimates) you would get your
> probabilities per each target label for the last .predict call
>
> if for SVM - enable estimation of probabilities (I believe a sigmoid is
> fit by libsvm in the decision boundary neighborhood) " probability=1"
> and then get them from .ca.probabilities
>
>
> or some other classifier? GDA/LDA/GNB...
>
> would that help?
> >    Hope it is an xhaustive and an answerable question!�
> >    Thank you
> >    Roberto
>
> > _______________________________________________
> > Pkg-ExpPsy-PyMVPA mailing list
> > Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
>
> --
> Yaroslav O. Halchenko
> Postdoctoral Fellow,   Department of Psychological and Brain Sciences
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20130118/39417b1b/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list