<br><br><div class="gmail_quote">On Dec 6, 2007 12:02 PM, Yaroslav Halchenko <<a href="mailto:debian@onerussian.com">debian@onerussian.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">> I started to stick the mahalanobisDistance function in with the<br>> metrics, but then realized that this is not where mahalanobis would<br>> ever be used.<br></div>who knows... who knows... at least some more generic
<br>mahalanobisDistance can easily be used with searchlight at the moment if<br>its aim is to define the neighbors for the searchlight, but as you<br>pointed out it is not what you want.<br><br>BTW - I've merged your branch and did some silly changes so your code
<br>became a bit more pylint friendly, which me and Michael agreed to<br>use to enforce more or less uniform formatting of the code. We are<br>supposed to run pylint for everything we do... but it had been done just<br>sporadically thus we are having some "Make pylint happier" commits from
<br>time to time ;-)<br></blockquote><div><br>Sounds good, I'll try and be more pylint-friendly :)<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>Also there is an agreement to use epydoc with restructuredtext for API<br>documentation, thus we are trying to describe parameters to the<br>functions the way I've done for the mahalanobisDistance. You can<br>generate all documentation simply by
<br>make doc<br>or just<br>make apidoc<br>if you like only to get epydoc generated pages<br><div class="Ih2E3d"></div></blockquote><div><br>Yup, I'll add in the proper docs.<br><br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d"><br>> neighborhood information for a voxel, it's actually more like a<br>> classifier.<br></div>I see ;-)<br><div class="Ih2E3d"><br>> The other difference is that there are major
<br>> optimizations I've implemented for calculating the pairwise distances<br>> on a whole set of vectors, not just two at a time.<br></div>I wonder if we may be should make all distance functions be able to
<br>operate on lists of points instead of just a pair of points... although<br>as of now it might lead to a bit of code duplication inside of them<br><div class="Ih2E3d"></div></blockquote><div><br>Yes, you can do loads at once much faster than looping.
<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>> And I have no secrets for how I want to use it. I've been thinking
<br>> about a supervised mahalanobis distance kernel for classification<br>> along with Francois Meyer. The basic idea is that you would take into<br>> account the underlying distributions of the labeled samples when
<br>> calculating the kernel distances at training and when determining the<br>> distances for the test points.<br></div>am i reading it right: calculating the kernel distances at training and<br>using that distance metric later on when determining the distances for
<br>the test points. Right? not that you would use testing points as well to<br>determine underlying distributions (which would bias generalization<br>estimate since classifier would see testing data)<br><br>if you provide training in x and testing in y, covariance gets computed
<br>using all of them... imho it is not acceptible to get unbiased<br>generalization estimate<br><div class="Ih2E3d"></div></blockquote><div><br>Oh, no peeking here! You only use the labeled points to calculate the covariance matrix. In fact, you calculate a separate covariance matrix for each unique label! Then you use the proper covariance matrix when doing a comparison of an unknown sample to a known sample.
<br><br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>> Then you can use these, supposedly<br>> better, kernels in place of any other kernel in any of the kernel
<br>> methods such as SVM, kerneled ridger regression, ...<br></div>I wonder if smth like that wasn't tried yet by anyone... seems like an<br>obvious thing to give a try ;-) although everything genius supposed to
<br>be simple ;-)<br><div class="Ih2E3d"></div></blockquote><div><br>Folks have used mahalanobis distance for KNN before and sometimes with some good results. It all depends on whether you actually have a skewed covariance matrix.
<br> <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="Ih2E3d"><br>> Given that you<br>> need to have more samples than features for mahalanobis to make much
<br>> sense, I would like to run this within a searchlight.<br></div>please correct me if I am wrong -- so searchlight would actually operate<br>within cartesian coord system to select the neighbors, but then within
<br>those neighbors set you compute corresponding covariance (or smth else)<br>which provides the matrix for mahalanobis, which you would use inside<br>the classifier only (not to select voxels within a searchlight)<br><div class="Ih2E3d">
</div></blockquote><div><br>Yes, for example, I want to know the distance between two samples within that searchlight.<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d"><br>> So, given all that, do you still think I should drop the<br>> mahalanobisDistance function into the metric.py code? I'll stick it<br>> in there for now so that you can see it.
<br></div>it would be great if you provided a testcase for it. I think it can be<br>considerably reduced in size thus increasing readability but I am afraid<br>to break it. Those silly unittests help a bit ;-)<br>I think its place is in metric but as I mentioned it might be better to
<br>reshape it: we can have a functor, which is parametrized with x,y,w so<br>that covariance is computed while initializing it. Then, in __call__ it<br>can take x,y=None and spit out matrix of distances or a scalar if there
<br>is only 2 points (ie x and y). This way it would satisfy interface of<br>the other distance functions in there and would allow you to use it as<br>you intended. or am I wrong?</blockquote><div><br>I think that would work fine.
<br><br>Thanks for all the discussions on this :) I'm off to go collect some more fMRI datums...<br><br>P<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br><font color="#888888">--<br></font><div><div></div><div class="Wj3C7c">Yaroslav Halchenko<br>Research Assistant, Psychology Department, Rutgers-Newark<br>Student Ph.D. @ CS Dept. NJIT<br>Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
<br> 101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102<br>WWW: <a href="http://www.linkedin.com/in/yarik" target="_blank">http://www.linkedin.com/in/yarik</a><br><br>_______________________________________________
<br>Pkg-exppsy-maintainers mailing list<br><a href="mailto:Pkg-exppsy-maintainers@lists.alioth.debian.org">Pkg-exppsy-maintainers@lists.alioth.debian.org</a><br><a href="http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers" target="_blank">
http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-maintainers</a><br></div></div></blockquote></div><br>