[pymvpa] GPR woes

Per B. Sederberg persed at princeton.edu
Sat Mar 28 18:41:12 UTC 2009


Here's the summary:

In [41]: print dat.summary()
Dataset / float32 2142 x 29155
uniq: 6 chunks 2 labels
stats: mean=-4.0314e-06 std=0.916641 var=0.84023 min=-1.23648 max=43.8952

Counts of labels in each chunk:
  chunks\labels  0   1
                --- ---
       0        173 184
       1        168 189
       2        184 173
       3        188 169
       4        173 184
       5        196 161

Summary per label across chunks
  label mean  std min max #chunks
   0     180 9.81 168 196    6
   1     177 9.81 161 189    6

Summary per chunk across labels
  chunk mean  std min max #labels
   0     178  5.5 173 184    2
   1     178 10.5 168 189    2
   2     178  5.5 173 184    2
   3     178  9.5 169 188    2
   4     178  5.5 173 184    2
   5     178 17.5 161 196    2


It does look like there are some outliers in there based on the min
and max above.  Note that I have already zscored the data at this
point.

I'm getting some really good classification now with the
regularization parameter in there.  At some point I may try looking
for outliers and explicitly removing them and then seeing if I need
the regularization.  The good news is that the regularization is
seeming to have no effect on the chunks of the data that didn't need
it before, so I'm not really hurting anything by having it there (at
least for this dataset).

Best,
P

2009/3/28 Yaroslav Halchenko <debian at onerussian.com>:
> Hypothesis:
> I think that one of the possible causes for such behavior is whenever
> your features are not normed to lie approximately in the same range...
> assume that you have one which is 100 and the rest is "proper" ones in
> [-5, 5] (zscores?). Or you have some outliers/excursions from this
> 'normed' case. Then while doing summation/dot-products. That heavy
> feature would demolish the effect of variance in the "proper" features.
>
> What is the histogram of your features? or just
>
> print dataset.summary()
>
> ?
>
> On Sat, 28 Mar 2009, Per B. Sederberg wrote:
>
>> Howdy all:
>
>> I've got it working for my data (had to add what I thought was a
>> pretty hefty regularization of .001 before it stopped giving me the
>> error, but it turns out I get the same classification performance with
>> a value of 2.0).
>
>> I made it so that regularization is always applied, but that it
>> defaults to 0.0, so the default behavior of the classifier is
>> unchanged.  Here's the commit message:
>
>> --------------------
>> I'm not sure what to call it, so I put it all.  Anyway, I just added
>> the regularization
>> parameter to be used all the time by GPR.  It still defaults to 0.0,
>> so there should be no
>> change for people currently using GPR with no issues.  However, this
>> enables the manual
>> setting of the regularization if you run into the not positive,
>> definite error.  I figured
>> that if you were going to use regularization, you would want to use it
>> all the time and not
>> have different values on different folds of your data.
>> ----------------------
>
>
>> Thanks for all the input today.  I now have working GPR classification :)
>
>> Best,
>> Per
>
>
>> On Sat, Mar 28, 2009 at 12:48 PM, Per B. Sederberg <persed at princeton.edu> wrote:
>> > On Sat, Mar 28, 2009 at 12:06 PM, Scott Gorlin <gorlins at mit.edu> wrote:
>> >> you could try 10x the calculated eps :)
>
>> >> Depending on how long it takes to run your classifier, why don't you try
>> >> varying epsilon logarithmically and plotting the error as a function of
>> >> epsilon.  Most likely you'll see a local min around some optimum value of
>> >> regularization, and then it will plateau out as epsilon increases - this
>> >> should, at least, convince you that a high epsilon isn't that bad (if it is,
>> >> then your data probably aren't as separable as they should be...)
>
>
>> > A great idea!  I'll expose the regularization parameter as a variable
>> > to GPR and try it out.
>
>> >> All that high epsilon does in Tikhonov regularization is penalize steep
>> >> hyperplanes, as it adds the norm of a function to the loss, so the results
>> >> will be smoother (and theoretically generalize better).  So, it should be
>> >> set so that  epsilon*function norm is ballpark equal to the raw classifier
>> >> loss, so that it makes a difference (though I don't know the exact form in
>> >> the GPR implementation; i'm just trying to say that there is an effective
>> >> range which isn't necessarily approaching 0).  As long as Xfold validation
>> >> still works with high epsilon, there's nothing to worry about - in fact,
>> >> that's what it's designed to do.
>
>
>> > So, this begs the question of whether we should have some level of
>> > regularization all the time.  The main difficulty would be figuring
>> > out what it should be (ballpark value) for different datasets (i.e.,
>> > whether there is some principled first-pass for setting it based on
>> > the features of your data.)
>
>> > Thanks for all the info,
>> > Per
>
>
>> >> if you still get a linalg error with very high epsilon, then there's
>> >> probably a different problem ;)
>
>> >> Per B. Sederberg wrote:
>
>> >>> hi s&e:
>
>> >>> this is almost exactly what I did, but I still get the same error.  in
>> >>> numpy you get the precision with:
>
>> >>> np.finfo(np.float64).eps
>
>> >>> I tried using that value, but actually checking the eps of the C
>> >>> matrix, and also tried using the float32 eps, which is greater than 2
>> >>> times the float64 eps.  all to no avail, so i'm worried my data will
>> >>> need a much larger eps.  what are the bad consequences I could run
>> >>> into by adding in a major regularization?
>
>> >>> thanks,
>> >>> p
>
>> >>> ps: in case it's informative, I have over 1000 samples so the kernel
>> >>> matrix is quite large...
>
>> >>> On 3/28/09, Scott Gorlin <gorlins at mit.edu> wrote:
>
>
>> >>>> I haven't checked the python syntax, but in Matlab, the eps function can
>> >>>> take a number and return the smallest recordable numerical difference at
>> >>>> that level.  If there is an equivalent in numpy then (say 2x) this would
>> >>>> be a good alternative default epsilon (calculated at train time)
>
>> >>>> Emanuele Olivetti wrote:
>
>
>> >>>>> Dear Per and Scott,
>
>> >>>>> You are both right and I feel quite guilty since I did not spend
>> >>>>> time on GPR as I promised long time ago (basically lack of time). The
>> >>>>> epsilon issue is exactly what Scott says: I discussed the same issue
>> >>>>> with Yarick some time ago but then did not work on a principled
>> >>>>> solution.
>> >>>>> I'm fine if you patch the current code in order to increase epsilon
>> >>>>> till a
>> >>>>> certain extent. Any idea about alternative/principled solutions?
>
>> >>>>> Moreover I'm (slowly) after another numerical instability due, I
>> >>>>> guess, to
>> >>>>> underflow in summation which could affect the marginal likelihood
>> >>>>> computation (which means it is highly relevant for GPR). This is more
>> >>>>> likely to happen when the dataset is large. So be prepared to that ;-)
>
>> >>>>> As far as I remember I forced the kernel matrix to be double precision
>> >>>>> for exactly the same reasons you mentioned, but checking more
>> >>>>> accurately this fact is highly desirable.
>
>> >>>>> Unfortunately I'm not going to work on GPR very soon, even though
>> >>>>> I'd love to. So any patch is warmly welcome, as well as bug reports ;-)
>
>> >>>>> Best,
>
>> >>>>> Emanuele
>
>
>> >>>>> Per B. Sederberg wrote:
>
>
>> >>>>>> Hi Scott:
>
>> >>>>>> You are correct on all accounts.  epsilon is currently hard-coded and
>> >>>>>> either of your proposed solutions would help to make GPR work more
>> >>>>>> frequently/consistently.  I'll try out some combination of both and
>> >>>>>> report back.
>
>> >>>>>> Best,
>> >>>>>> Per
>
>> >>>>>> On Sat, Mar 28, 2009 at 9:46 AM, Scott Gorlin <gorlins at mit.edu> wrote:
>
>
>
>> >>>>>>> Epsilon is the Tikhonov regularization parameter, no?  (Haven't read
>> >>>>>>> much about this classifier, but it looks similar to the RLS solution
>> >>>>>>> to
>> >>>>>>> me...).  As such it should be adjustable to enable punishing overfit
>> >>>>>>> solutions, so maybe it should actually be a formal parameter.
>
>> >>>>>>> If the default returns a LinAlg error, you could also stick the
>> >>>>>>> cholesky
>> >>>>>>> decomposition in a finite loop where epsilon is doubled each time
>> >>>>>>> (though not too many iterations...)
>
>> >>>>>>> On a wild tangent, also check that your kernel matrix (here self._C)
>> >>>>>>> is
>> >>>>>>> double precision, since 1e-20 is close to the precision limit for
>> >>>>>>> single
>> >>>>>>> floats.
>
>> >>>>>>> Per B. Sederberg wrote:
>
>
>
>> >>>>>>>> Hi Everybody (and Emanuele in particular):
>
>> >>>>>>>> I've tried GPR on about 5 different datasets and I've never actually
>> >>>>>>>> gotten it to run through an entire cross validation process.  I
>> >>>>>>>> always
>> >>>>>>>> get the following error:
>
>> >>>>>>>> /home/per/lib/python/mvpa/clfs/gpr.pyc in _train(self, data)
>> >>>>>>>>    308             except SLAError:
>> >>>>>>>>    309                 epsilon = 1.0e-20 * N.eye(self._C.shape[0])
>> >>>>>>>> --> 310                 self._L = SLcholesky(self._C + epsilon,
>> >>>>>>>> lower=True)
>> >>>>>>>>    311                 self._LL = (self._L, True)
>> >>>>>>>>    312                 pass
>
>> >>>>>>>> /usr/lib/python2.5/site-packages/scipy/linalg/decomp.pyc in
>> >>>>>>>> cholesky(a, lower, overwrite_a)
>> >>>>>>>>    552     potrf, = get_lapack_funcs(('potrf',),(a1,))
>> >>>>>>>>    553     c,info =
>> >>>>>>>> potrf(a1,lower=lower,overwrite_a=overwrite_a,clean=1)
>> >>>>>>>> --> 554     if info>0: raise LinAlgError, "matrix not positive
>> >>>>>>>> definite"
>> >>>>>>>>    555     if info<0: raise ValueError,\
>> >>>>>>>>    556        'illegal value in %-th argument of internal
>> >>>>>>>> potrf'%(-info)
>
>> >>>>>>>> LinAlgError: matrix not positive definite
>
>> >>>>>>>> I think Yarik mentioned changing epsilon or something like that, but
>> >>>>>>>> it seems to me that having to change source code to tweak a
>> >>>>>>>> classifier
>> >>>>>>>> to not crash is not ideal.
>
>> >>>>>>>> Do you have any suggestions for how to fix this?  I've been
>> >>>>>>>> tantalized
>> >>>>>>>> with some very good transfer errors in the folds that GPR was able to
>> >>>>>>>> complete before crashing.  Is there a permanent change we could make
>> >>>>>>>> to the GPR code to make it more stable across datasets?  Or could we
>> >>>>>>>> turn epsilon into a configurable variable when setting up the
>> >>>>>>>> classifier?  If so, what should I change epsilon to in order to see
>> >>>>>>>> if
>> >>>>>>>> I can get it to run?
>
>> >>>>>>>> Thanks,
>> >>>>>>>> Per
>
>> >>>>>>>> _______________________________________________
>> >>>>>>>> Pkg-ExpPsy-PyMVPA mailing list
>> >>>>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> >>>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>
>
>
>
>> >>>>>>> _______________________________________________
>> >>>>>>> Pkg-ExpPsy-PyMVPA mailing list
>> >>>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> >>>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>
>
>
>
>> >>>>>> _______________________________________________
>> >>>>>> Pkg-ExpPsy-PyMVPA mailing list
>> >>>>>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> >>>>>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>
>
>
>
>
>
>> _______________________________________________
>> Pkg-ExpPsy-PyMVPA mailing list
>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>
>
> --
> Yaroslav Halchenko
> Research Assistant, Psychology Department, Rutgers-Newark
> Student  Ph.D. @ CS Dept. NJIT
> Office: (973) 353-1412 | FWD: 82823 | Fax: (973) 353-1171
>        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
> WWW:     http://www.linkedin.com/in/yarik
>



More information about the Pkg-ExpPsy-PyMVPA mailing list