<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 25, 2015 at 5:15 PM, Nick Oosterhof <span dir="ltr"><<a href="mailto:n.n.oosterhof@googlemail.com" target="_blank">n.n.oosterhof@googlemail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

On 25 Feb 2015, at 15:49, gal star <<a href="mailto:gal.star3051@gmail.com">gal.star3051@gmail.com</a>> wrote:<br>

<br>

> I am doing a k-fold cross validation on a data according to the following:<br>

> 1. I'm partioning the data myself - set a train ('0' chunk) and test chunks ('1' chunk).<br>

> 2. Using clf.train() and then clf.predict()<br>

> 3. print the accuracy result and confusion matrix.<br>

> And i'm repeating this k times (by running the script attached k times) […]<br>

</span><span class="">> The standard diviation among the accuracy results produced when using CrossValidation class, and the standard diviation among<br>

> accuracy results in the way i described are different.<br>

<br>

</span>- your script really is quite messy (lots of code commented out, no documentation), which does not invite others to read and understand your code. Furthermore, it does not actually allow others to reproduce the issue. For future reference It is helpful if you can provide a minimal running example so that others can reproduce what you reported.<br>

<br></blockquote><div><br></div><div>Of course, sorry about that. Here is the minimal running example:</div><div><br></div><div>fds=fmri_dataset(samples='4D_scans.nii.gz')</div><div>zscore(fds, param_est=('targets', ['control']))</div><div>int = numpy.array([l in ['class A','class B'] for l in fds.sa.targets])</div><div>fds = fds[int]</div><div><br></div><div>clf = FeatureSelectionClassifier(LinearCSVMC(), SensitivityBasedFeatureSeletion(OneWayAnova(), FixedNElementTailSelector(1000 ,tail='upper',mode='select')))</div><div><br></div><div>nfold = NFoldPartitioner(attr='chunks')</div><div><br></div><div>< Python Code for selecting only '0' chunk for train and '1' for test> </div><div>clf.train(train)</div><div>print clf.predict(test.samples)</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- from what I understand from the script, you provide ‘fold' as a parameter, but that parameter is actually not used in the script for your ‘manual’ crossvalidation. In your manual cross validation, you seem to always use chunk 0 for training and chunk 1 for testing.</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

- the nfold for k folds partitioner trains on (k-1) folds, which is more than the 1 fold you are training on if k>2. This will generally lead to more stable results and thus a lower standard deviation of accuracies.<br></blockquote><div><br></div><div>I am marking '0' for all k-1 folds and only one fold as '1'.</div><div>The reason i'm doing that instead of using CrossValidation is because</div><div>I'm balancing the data by duplicating some datapoints from 'class B'.</div><div><br></div><div>Still missing the idea of errorfx. </div><div>Is it different since i'm running it manually?</div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

_______________________________________________<br>

Pkg-ExpPsy-PyMVPA mailing list<br>

<a href="mailto:Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org">Pkg-ExpPsy-PyMVPA@lists.alioth.debian.org</a><br>

<a href="http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa" target="_blank">http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa</a><br>

</blockquote></div><br></div></div>