<div dir="ltr">Thanks for your feedback!<div><br></div><div>1. Nested cross-validation is done across subjects. For a set of 28 subjects (16 (gr1) + 12 (gr2)), 27 subjects are selected and the SMLR accuracies are computed for several lms (e.g., 0.2, 0.5, 1.0) using a leave-one-pair-of-subjects-<wbr>from-different-groups cross-validation (so train on 25 subjects, test on 2). This procedure is repeated 27 times (so one different subject is out for each round of NCV). The mean accuracy is computed across all splits within each round across all 27 rounds. The lm corresponding to the highest mean accuracy is selected as the best parameter and used in the full sample (n=28) SMLR.</div><div><br></div><div>2. It would be great to balance training sets also (to have equal number per group), but I was not sure if one set of randomly selected subjects from a large groupÂ  (that is equal to the size of the smaller group) would do, or if the procedure should be repeated multiple times (to make sure that all subjects were included across training sets). I am also not sure how to incorporate generation of equal random sets to the scripts.Â </div><div><br></div><div>Processing time may become an issue. It takes slightly more than 2 min to run it for 28 subjects (given 192 splits produced by ChainNode). With more training sets (and potentially more subjects or even more unbalanced datasets) the processing time can become infinitely longer...</div><div><br></div><div>The script with Monte Carlo testing (with 1000 repetitions) described in my first e-mailÂ  never finished running (well I stopped it after a day or so). There should be some more efficient way of creating a null distribution for unbalanced datasets...</div><div><br></div><div><div>3. Below is the output ofÂ </div><div>print cv.ca.stats.matrix</div><div>[[141Â  68]</div><div>Â [ 51 124]]</div></div><div><br></div><div>Â Note there are 192 splits.</div><div><br></div><div>ACC%=69.01%</div><div>TPR "better" = 0.73</div><div>TPR "worse" = 0.65</div><div><br></div><div>Confidence intervals (computed using BinomialProportionCI(width=.95, meth='jeffreys')) are<br></div><div>64.3%< ACC% < 73.5%.</div><div><br></div><div><br></div><div>4. So that new error function inÂ <span style="font-size:12.8px">Â </span><a href="https://github.com/PyMVPA/PyMVPA/pull/541" rel="noreferrer" target="_blank" style="font-size:12.8px">https://github.com/PyMVPA/<wbr>PyMVPA/pull/541</a>Â computes a mean of TPRs?</div><div><br></div><div>Thank you,</div><div>Anna.</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 17, 2017 at 4:59 PM, Yaroslav Halchenko <span dir="ltr"><<a href="mailto:debian@onerussian.com" target="_blank">debian@onerussian.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

On Wed, 11 Oct 2017, Anna Manelis wrote:<br>

<br>

> Dear Experts,<br>

<br>

> I have an unbalanced dataset (group1=16 subjects ('better'), group2=12<br>

> subjects ('worse').<br>

<br>

</span>is there any groupping of subjects or each has a dedicated "chunk"?<br>

<span class=""><br>

> I would like to perform Monte-Carlo testing of the SMLR<br>

> between-subject classification analysis (performed on cope images).Â  I used<br>

> the permutation_test.py script as an example and tried to adjust it for my<br>

> unbalanced dataset.<br>

<br>

> Any feedback on whether the script below makes sense is greatly appreciated.<br>

<br>

</span>well -- all the disbalanced cases are "tricky" to say the least ;)<br>

<br>

I think, that for the peace of mind, you should also tryÂ  your pipeline on the<br>

totally random data (ds.samples = np.random.normal(size=ds.<wbr>shape))Â  to make<br>

sure that "garbage in, garbage out", and nothing is "significant" (well, should<br>

only be in "p" portion of the cases ;))<br>

<span class=""><br>

> lm = 0.2 # The value was determined through the nested cross-validation<br>

> procedure<br>

<br>

</span>if it was nested, you were most likely to end up with different lm per<br>

each split.Â  or in other words: how was it done and among which lm did<br>

you check?Â  Feels like some double dipping might be going on<br>

<span class=""><br>

> clf = SMLR(lm=lm, fit_all_weights=False, descr='SMLR(lm=%g)' % lm)<br>

<br>

> ### how often do we want to shuffle the data<br>

> repeater = Repeater(count=200)<br>

<br>

> ### permute the training part of a dataset exactly ONCE<br>

> permutator = AttributePermutator('targets', limit={'partitions': 1},<br>

> count=1)<br>

<br>

> ### Make sure that each group ("better" and "worse") has the same number of<br>

> samples<br>

> par = ChainNode([NFoldPartitioner(<wbr>cvtype=2, attr='chunks'),<br>

>Â  Â  Â  Â  Â  Â  Â  Â  Â  Â Sifter([('partitions', 2),<br>

>Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â ('targets', dict(uvalues=['better', 'worse'],<br>

>Â  Â  Â  Â  Â  Â  Â  Â  balanced=True))])<br>

<br>

</span>so, here you are saying to select the partitioning where you have<br>

balanced # of samples in the testing set... so training set will remain<br>

dis-balanced.Â  That would lead SMLR to tend to classify as the "better".<br>

How does your cv.ca.stats look like -- is my fear fulfilled?<br>

<br>

here is an example:<br>

<br>

In [51]: import mvpa2.suite as mv<br>

<br>

In [52]: cv = mv.CrossValidation(mv.SMLR(), mv.NFoldPartitioner(), errorfx=mv.mean_mismatch_<wbr>error, enable_ca=['stats'])<br>

<br>

In [53]: ds = mv.normal_feature_dataset(<wbr>perlabel=10, nlabels=2, nfeatures=4, nonbogus_features=[0,1],snr=0)<br>

<br>

In [54]: res = cv(ds); print cv.ca.statsÂ  # on random data -- near chance result<br>

----------.<br>

predictions\targetsÂ  L0Â  Â L1<br>

Â  Â  Â  Â  Â  Â  `------Â  ---Â  --- P' N'Â  FPÂ  FNÂ  PPVÂ  NPVÂ  TPRÂ  SPCÂ  FDRÂ  MCCÂ  F1Â  AUC<br>

Â  Â  Â  Â  Â L0Â  Â  Â  Â  Â  Â 5Â  Â  6Â  11Â  9Â  6Â  Â 5Â  0.45 0.44Â  0.5Â  0.4 0.55 -0.1 0.48 0.3<br>

Â  Â  Â  Â  Â L1Â  Â  Â  Â  Â  Â 5Â  Â  4Â  Â 9 11Â  5Â  Â 6Â  0.44 0.45Â  0.4Â  0.5 0.56 -0.1 0.42 0.3<br>

Per target:Â  Â  Â  Â  Â  ---Â  ---<br>

Â  Â  Â  Â  Â PÂ  Â  Â  Â  Â  Â 10Â  Â 10<br>

Â  Â  Â  Â  Â NÂ  Â  Â  Â  Â  Â 10Â  Â 10<br>

Â  Â  Â  Â  Â TPÂ  Â  Â  Â  Â  Â 5Â  Â  4<br>

Â  Â  Â  Â  Â TNÂ  Â  Â  Â  Â  Â 4Â  Â  5<br>

Summary \ Means:Â  Â  Â ---Â  --- 10 10 5.5 5.5 0.45 0.45 0.45 0.45 0.55 -0.1 0.45 0.3<br>

Â  Â  Â  Â CHI^2Â  Â  Â  Â  Â 0.4 p=0.94<br>

Â  Â  Â  Â  ACCÂ  Â  Â  Â  Â 0.45<br>

Â  Â  Â  Â  ACC%Â  Â  Â  Â  Â 45<br>

Â  Â  Â # of setsÂ  Â  Â  Â  5Â  Â ACC(i) = 0.6-0.075*i p=0.32 r=-0.57 r^2=0.32<br>

<br>

<br>

In [55]: ds_ = ds[[0, 2, 4, 6, 8, 10] + range(11, 20)]<br>

<br>

In [56]: print ds_.summary()<br>

Dataset: 15x4@float64, <sa: chunks,targets>, <fa: nonbogus_targets>, <a: bogus_features,nonbogus_<wbr>features><br>

stats: mean=0.00169964 std=0.315902 var=0.0997943 min=-0.600959 max=0.737558<br>

<br>

Counts of targets in each chunk:<br>

Â  chunks\targetsÂ  L0Â  L1<br>

Â  Â  Â  Â  Â  Â  Â  Â  Â --- ---<br>

Â  Â  Â  Â  0Â  Â  Â  Â  Â 1Â  Â 2<br>

Â  Â  Â  Â  1Â  Â  Â  Â  Â 1Â  Â 2<br>

Â  Â  Â  Â  2Â  Â  Â  Â  Â 1Â  Â 2<br>

Â  Â  Â  Â  3Â  Â  Â  Â  Â 1Â  Â 2<br>

Â  Â  Â  Â  4Â  Â  Â  Â  Â 1Â  Â 2<br>

<br>

Summary for targets across chunks<br>

Â  targets mean std min max #chunks<br>

Â  Â  L0Â  Â  Â  1Â  Â 0Â  Â 1Â  Â 1Â  Â  Â 5<br>

Â  Â  L1Â  Â  Â  2Â  Â 0Â  Â 2Â  Â 2Â  Â  Â 5<br>

<br>

Summary for chunks across targets<br>

Â  chunks mean std min max #targets<br>

Â  Â  0Â  Â  Â 1.5 0.5Â  1Â  Â 2Â  Â  Â  2<br>

Â  Â  1Â  Â  Â 1.5 0.5Â  1Â  Â 2Â  Â  Â  2<br>

Â  Â  2Â  Â  Â 1.5 0.5Â  1Â  Â 2Â  Â  Â  2<br>

Â  Â  3Â  Â  Â 1.5 0.5Â  1Â  Â 2Â  Â  Â  2<br>

Â  Â  4Â  Â  Â 1.5 0.5Â  1Â  Â 2Â  Â  Â  2<br>

Sequence statistics for 15 entries from set ['L0', 'L1']<br>

Counter-balance table for orders up to 2:<br>

Targets/Order O1Â  Â  |Â  O2Â  Â  |<br>

Â  Â  Â L0:Â  Â  Â  Â 4 1Â  |Â  Â 3 2Â  |<br>

Â  Â  Â L1:Â  Â  Â  Â 0 9Â  |Â  Â 0 8Â  |<br>

Correlations: min=-0.5 max=0.7 mean=-0.071 sum(abs)=5.8<br>

<br>

In [57]: res = cv(ds_); print cv.ca.statsÂ  # on random and disbalanced data<br>

----------.<br>

predictions\targetsÂ  Â L0Â  Â L1<br>

Â  Â  Â  Â  Â  Â  `------Â  Â ---Â  ---Â  P'Â  N'Â  FPÂ  FNÂ  PPVÂ  NPVÂ  TPRÂ  SPCÂ  FDRÂ  MCCÂ  Â F1Â  AUC<br>

Â  Â  Â  Â  Â L0Â  Â  Â  Â  Â  Â 1Â  Â  Â 3Â  Â 4Â  Â 11Â  3Â  Â 4Â  0.25 0.64Â  0.2Â  0.7 0.75 -0.11 0.22 0.6<br>

Â  Â  Â  Â  Â L1Â  Â  Â  Â  Â  Â 4Â  Â  Â 7Â  Â 11Â  4Â  Â 4Â  Â 3Â  0.64 0.25Â  0.7Â  0.2 0.36 -0.11 0.67 0.6<br>

Per target:Â  Â  Â  Â  Â  Â ---Â  ---<br>

Â  Â  Â  Â  Â PÂ  Â  Â  Â  Â  Â  5Â  Â  10<br>

Â  Â  Â  Â  Â NÂ  Â  Â  Â  Â  Â  10Â  Â  5<br>

Â  Â  Â  Â  Â TPÂ  Â  Â  Â  Â  Â 1Â  Â  Â 7<br>

Â  Â  Â  Â  Â TNÂ  Â  Â  Â  Â  Â 7Â  Â  Â 1<br>

Summary \ Means:Â  Â  Â  ---Â  --- 7.5 7.5 3.5 3.5 0.44 0.44 0.45 0.45 0.56 -0.11 0.44 0.6<br>

Â  Â  Â  Â CHI^2Â  Â  Â  Â  Â 3.4Â  p=0.33<br>

Â  Â  Â  Â  ACCÂ  Â  Â  Â  Â  0.53<br>

Â  Â  Â  Â  ACC%Â  Â  Â  Â  53.33<br>

Â  Â  Â # of setsÂ  Â  Â  Â  5Â  Â  ACC(i) = 0.67-0.067*i p=0.65 r=-0.28 r^2=0.08<br>

<br>

<br>

# although overall accuracy might be low, you can see that classifier tends to misclassify into L1<br>

<br>

<br>

and overall mean accuracy is a "suboptimal" measure in this case.<br>

<br>

What we could have estimated was the mean of the accuracies per each<br>

class -- that one behaves better in case of the disbalanced datasets<br>

<br>

So in this case, with mean_mismatch_error (mean_match_accuracy reported<br>

in the ca.stats) we are getting<br>

<br>

In [63]: (1+7)/(5.+10)<br>

Out[63]: 0.5333333333333333<br>

<br>

whenever we should use for such cases (and may be overall, while also<br>

taking care about chunks, since splits could be disbalanced differently,<br>

in this example I just assume a single "measurement"):<br>

<br>

In [62]: np.mean([1/5., 7/10.])<br>

Out[62]: 0.44999999999999996<br>

<br>

although numbers here areÂ  not really demonstrative, let's consider more<br>

critical case, where all the samples would have gone into the "majority"<br>

class:<br>

<br>

In [70]: cm = mv.ConfusionMatrix()<br>

<br>

In [71]: cm.add(['L1']*5 + ['L2']*10, ['L2'] * 15)<br>

<br>

In [72]: print cm<br>

----------.<br>

predictions\targetsÂ  Â L1Â  Â L2<br>

Â  Â  Â  Â  Â  Â  `------Â  Â ---Â  ---Â  P'Â  N'Â  FPÂ  FNÂ  PPVÂ  NPV TPR SPCÂ  FDR MCCÂ  F1<br>

Â  Â  Â  Â  Â L1Â  Â  Â  Â  Â  Â 0Â  Â  Â 0Â  Â 0Â  Â 15Â  0Â  Â 5Â  Â nan 0.67Â  0Â  Â 1Â  Â nanÂ  0Â  Â 0<br>

Â  Â  Â  Â  Â L2Â  Â  Â  Â  Â  Â 5Â  Â  10Â  Â 15Â  0Â  Â 5Â  Â 0Â  0.67Â  nanÂ  1Â  Â 0Â  0.33Â  0Â  0.8<br>

Per target:Â  Â  Â  Â  Â  Â ---Â  ---<br>

Â  Â  Â  Â  Â PÂ  Â  Â  Â  Â  Â  5Â  Â  10<br>

Â  Â  Â  Â  Â NÂ  Â  Â  Â  Â  Â  10Â  Â  5<br>

Â  Â  Â  Â  Â TPÂ  Â  Â  Â  Â  Â 0Â  Â  10<br>

Â  Â  Â  Â  Â TNÂ  Â  Â  Â  Â  Â 10Â  Â  0<br>

Summary \ Means:Â  Â  Â  ---Â  --- 7.5 7.5 2.5 2.5Â  nanÂ  nan 0.5 0.5Â  nanÂ  0Â  0.4<br>

Â  Â  Â  Â CHI^2Â  Â  Â  Â  Â  15Â  p=0.0018<br>

Â  Â  Â  Â  ACCÂ  Â  Â  Â  Â  0.67<br>

Â  Â  Â  Â  ACC%Â  Â  Â  Â  66.67<br>

Â  Â  Â # of setsÂ  Â  Â  Â  1<br>

<br>

<br>

whohoo -- 66% correct!<br>

but if we compute the mean of per-class accuracies:<br>

<br>

In [73]: np.mean([0/5., 10/10.])<br>

Out[73]: 0.5<br>

<br>

1.Â  Unfortunately "lazy" us didn't implement such a dedicated error<br>

Â  Â  function yet, BUTÂ  you have thatÂ  value as a part of the<br>

<br>

Â  Â  In [84]: print cm.stats['mean(TPR)']<br>

Â  Â  0.5<br>

<br>

Â  Â  which is the mean of TPRs per each label, which in turn is what<br>

Â  Â  we are interested here<br>

<br>

2.Â  I am not quite certain if your above approach (though see below<br>

comment about postproc)Â  is completely without ground.Â  Since you assure<br>

that the test set always has one sample of each class, training<br>

set is always consistently disbalanced, and your null distribution<br>

should account for all those "tend to classify as the majority class",<br>

so I expectÂ  p value for the true classification performance being<br>

"legit".<br>

<br>

how does your chance distribution look like? (distr_est.ca.dist_samples)<br>

<br>

meanwhile I will prep the "mean_tpr" function (which should match<br>

mean_match_accuracy in balanced case) and "mean_fnr" (which should match<br>

mean_mismatch_error in balanced case) (if I didn't mix anything<br>

up) to use in such cases.<br>

<span class=""><br>

<br>

> ### I believe this will apply permutator on each fold created by par.<br>

> par_perm = ChainNode([par, permutator], space=par.get_space())<br>

<br>

> # CV with null-distribution estimation that permutes the training data for<br>

> # each fold independently<br>

> null_cv = CrossValidation(<br>

>Â  Â  Â  Â  Â  Â  Â clf,<br>

>Â  Â  Â  Â  Â  Â  Â par_perm,<br>

>Â  Â  Â  Â  Â  Â  Â errorfx=mean_mismatch_error)<br>

<br>

</span>such null_cv and cv below willÂ  provide err per each split (do print err to see<br>

that).Â  You should addÂ  Â postproc=mean_sample()Â  Â to both of them.<br>

<div class="HOEnZb"><div class="h5"><br>

> # Monte Carlo distribution estimator<br>

> distr_est = MCNullDist(repeater, tail='left', measure=null_cv,<br>

>Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  enable_ca=['dist_samples'])<br>

<br>

> # actual CV with H0 distribution estimation<br>

> cv = CrossValidation(clf, par, errorfx=mean_mismatch_error,<br>

>Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  null_dist=distr_est, enable_ca=['stats'])<br>

<br>

<br>

> err = cv(ds_mni_bw)<br>

<br>

> p = err.ca.null_prob<br>

> np.asscalar(p)<br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

Yaroslav O. Halchenko<br>

Center for Open NeuroscienceÂ  Â  Â <a href="http://centerforopenneuroscience.org" rel="noreferrer" target="_blank">http://<wbr>centerforopenneuroscience.org</a><br>

Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755<br>

Phone: <a href="tel:%2B1%20%28603%29%20646-9834" value="+16036469834">+1 (603) 646-9834</a>Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â Fax: <a href="tel:%2B1%20%28603%29%20646-1419" value="+16036461419">+1 (603) 646-1419</a><br>

WWW:Â  Â <a href="http://www.linkedin.com/in/yarik" rel="noreferrer" target="_blank">http://www.linkedin.com/in/<wbr>yarik</a><br>

</font></span></blockquote></div><br></div>