<div dir="ltr"><div><div><div>Any idea?<br></div>Or the list is no longer working?<br></div>There is no thread archived for 3 weeks...<br><br></div>Thanks.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 12, 2016 at 4:54 PM, basile pinsard <span dir="ltr"><<a href="mailto:basile.pinsard@gmail.com" target="_blank">basile.pinsard@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div>Hi everybody,<br><br></div>It seems pretty simple but I cannot find a way to have a sensible 2fold partitioner with balanced targets, feeding a balanced dataset (4 classes x 8 samples).<br></div><div>Optionally it would be sensible if all samples are used in testing the same number of times.<br></div><div><br>ChainNode(                                                                                       <br>    [ NFoldPartitioner(<br>        attr='chunks',<br>        cvtype=.5,<br>        count=128,<br>        selection_strategy='random'),<br>      Sifter([('partitions', 2),('targets', dict(balanced=True)) ]) ])<br></div><div>does generate balanced partitions, but will have variable number of cv folds, which is a problem.<br><br><br>ChainNode(<br>    [ NFoldPartitioner(<br>        attr='chunks',<br>        cvtype=.5,<br>        count=32,                                                                                                             <br>        selection_strategy='random'),<br>      Balancer(                                                                                                               <br>            amount='equal',                                        <br>            attr='targets',                                                                                                       <br>            count=1,                                                      <br>            apply_selection=False,                                                                                             <br>            limit=['partitions'],<br>            include_offlimit=True)])<br></div><div>does balance the partitions by eliminating some sample but this reduces the number of samples in training/testing sets, and not in a consistent way across folds.<br><br>FactorialPartitioner does the job but the count parameter is not working (generate method is overloaded), then it's combinatorial yield thousands of splits which is a bit much.<br><br></div><div>Would there be a way to repeatedly splits randomly taking half of each classes samples in both of the partitions?<br></div><div>Or maybe should we make FactorialPartitioner to respect Partitioner prototype (count/strategy parameters)?<br></div><div><br></div><div>Thanks.<span class="HOEnZb"><font color="#888888"><br><br></font></span></div><span class="HOEnZb"><font color="#888888"><div>basile<br></div></font></span></div>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div><font size="1">Basile Pinsard<br></font></div><i><font size="1">PhD candidate, <br></font></i></div><font size="1">Laboratoire d'Imagerie Biomédicale, UMR S 1146 / UMR 7371, Sorbonne Universités, UPMC, INSERM, CNRS</font><br><font size="1"><span><span style="color:#333333"><span style="font-family:Arial,serif"><span lang="en-GB"><em>Brain-Cognition-Behaviour Doctoral School </em></span></span></span><span style="color:#333333"><span style="font-family:Arial,serif"><span lang="en-GB"><strong>, </strong>ED3C<strong>, </strong>UPMC, Sorbonne Universités<br>Biomedical Sciences Doctoral School, Faculty of Medicine, Université de Montréal <br></span></span></span></span></font><font size="1">CRIUGM, Université de Montréal</font><br></div></div></div></div>
</div>