<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Matteo,<br>
    thank you again for your precious help!<br>
    <br>
    I have spotted a small typo in your gist. Line 105 should read:<br>
        for tm in tms_parallel:<br>
    <br>
    <br>
    Apart from the the problems with the nested_cv.py example that you
    refer to, I am experiencing some more troubles:<br>
    - backend='threading' does not seem to parallelize on my machine,
    only backend='multiprocessing'<br>
    - while your 'nested_cv_parallel.py' gist runs smoothly, when I
    adapt it on my dataset, partitioner, etc., I get the following
    error:<br>
    <br>
        tms_parallel, best_clfs_parallel = zip(*out_parallel)<br>
    TypeError: zip argument #1 must support iteration<br>
    <br>
    <br>
    I guess I am putting my nested analysis in standby for the moment.
    Hopefully these issues will soon be solved.<br>
    <br>
    Thank you for all!<br>
    Marco<br>
    <br>
    <br>
    <blockquote type="cite">On 28/11/2017 15:45, Matteo Visconti di
      Oleggio Castello wrote:<br>
      <br>
      Hi Marco,<br>
      <br>
      I think there are a bunch of conflated issues here.<br>
      <br>
      - First, there was an error in my code, and that's why you got the<br>
      error " UnboundLocalError:<br>
      local variable 'best_clf' referenced before assignment". I updated
      the<br>
      gist, and now the example code for running the parallelization
      should be ok<br>
      and should work as a blueprint for your code (<br>
      <a class="moz-txt-link-freetext" href="https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0">https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0</a>).<br>
      <br>
      - You are correct in changing the backend to 'threading' for this<br>
      particular case because of the pickling error.<br>
      <br>
      - However, I think that the example for nested_cv.py didn't work
      from the<br>
      start, even without parallelization. The last change was 6 years
      ago, and<br>
      I'm afraid that things changed in between and the code wasn't
      updated. I<br>
      opened an issue on github to keep track of it (<br>
      <a class="moz-txt-link-freetext" href="https://github.com/PyMVPA/PyMVPA/issues/559">https://github.com/PyMVPA/PyMVPA/issues/559</a>).<br>
      <br>
      <div class="moz-cite-prefix">On 24/11/2017 21:57, marco tettamanti
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:7db9d606-0482-03c1-0da7-5274d921eda3@gmail.com"> <font
          size="-1"><font face="Arial">Dear Matteo,<br>
            thank you for kindly replying!<br>
            <br>
            Yes, I do have the latest versions of joblib (0.11) and
            sklearn (0.19.1), see at the bottom of this email.<br>
            <br>
            The problem seems independent of either running in jupyter,
            or evoking ipython or python directly in the console.<br>
            <br>
            I am now wondering whether there may be something wrong in
            my snippet.<br>
            When first running your gist, I encountered an:<br>
            <br>
                    UnboundLocalError: local variable 'best_clf'
            referenced before assignment<br>
            <br>
            which I solved by moving the best_clf declaration a few
            lines down:<br>
            <br>
            -----------------------------------------<br>
            #best_clfs = {}  #moved down 7 lines<br>
            confusion = ConfusionMatrix()<br>
            verbose(1, "Estimating error using nested CV for model
            selection")<br>
            partitioner = partitionerCD<br>
            splitter = Splitter('partitions')<br>
            tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit,
            partitions) <br>
                                     for isplit, partitions in
            enumerate(partitionerCD.generate(fds)))<br>
            best_clfs = {}<br>
            for tm in tms:<br>
                confusion += tm.ca.stats<br>
                best_clfs[tm.measure.descr] =
            best_clfs.get(tm.measure.descr, 0) + 1<br>
            -----------------------------------------<br>
            <br>
            But now, running the snippet in ipython/python specifically
            for the SVM parallelization issue, I saw the error message
            popping up again:<br>
            <br>
                     UnboundLocalError: local variable 'best_clf'
            referenced before assignment<br>
            <br>
            May this be the culprit? As a reminder, the full snippet I
            am using is included in my previous email.<br>
            <br>
            Thank you and very best wishes,<br>
            Marco<br>
            <br>
            <br>
            In [21]: mvpa2.wtf(exclude=['runtime','process']) ##other
            possible arguments (['sources', <br>
            Out[21]: <br>
            Current date:   2017-11-24 21:23<br>
            PyMVPA:<br>
             Version:       2.6.3<br>
             Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785<br>
             Path:         
            /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc<br>
             Version control (GIT):<br>
             GIT information could not be obtained due
            "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"<br>
            SYSTEM:<br>
             OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian
            4.13.4-2 (2017-10-15)<br>
             Distribution:  debian/buster/sid<br>
            EXTERNALS:<br>
             Present:       atlas_fsl, cPickle, ctypes, good
            scipy.stats.rv_continuous._reduce_func(floc,fscale), good
            scipy.stats.rv_discrete.ppf, griddata, gzip, h5py, hdf5,
            ipython, joblib, liblapack.so, libsvm, libsvm verbosity
            control, lxml, matplotlib, mdp, mdp ge 2.4, mock, nibabel,
            nose, numpy, numpy_correct_unique, pprocess, pylab, pylab
            plottable, pywt, pywt wp reconstruct, reportlab, running
            ipython env, scipy, skl, statsmodels<br>
             Absent:        afni-3dinfo, atlas_pymvpa, cran-energy,
            datalad, elasticnet, glmnet, good scipy.stats.rdist,
            hcluster, lars, mass, nipy, nipy.neurospin, numpydoc,
            openopt, pywt wp reconstruct fixed, rpy2, scipy.weave, sg ge
            0.6.4, sg ge 0.6.5, sg_fixedcachesize, shogun, shogun.krr,
            shogun.lightsvm, shogun.mpd, shogun.svmocas,
            shogun.svrlight, weave<br>
             Versions of critical externals:<br>
              ctypes      : 1.1.0<br>
              h5py        : 2.7.1<br>
              hdf5        : 1.10.0<br>
              ipython     : 5.5.0<br>
              joblib      : 0.11<br>
              lxml        : 4.1.0<br>
              matplotlib  : 2.0.0<br>
              mdp         : 3.5<br>
              mock        : 2.0.0<br>
              nibabel     : 2.3.0dev<br>
              numpy       : 1.13.1<br>
              pprocess    : 0.5<br>
              pywt        : 0.5.1<br>
              reportlab   : 3.4.0<br>
              scipy       : 0.19.1<br>
              skl         : 0.19.1<br>
             Matplotlib backend: TkAgg<br>
            <br>
            <br>
          </font></font>
        <blockquote type="cite"><font size="-1"><font face="Arial">On
              24/11/2017 17:32, Matteo Visconti di Oleggio Castello
              wrote:<br>
              <br>
              Hi Marco,<br>
              <br>
              some ideas in random order<br>
              <br>
              - what version of sklearn/joblib are you using? I would
              make sure to use<br>
              the latest version (0.11), perhaps not importing it from
              sklearn (unless<br>
              you have the latest sklearn version, 0.19.1)<br>
              - are you running the code in a jupyter notebook? There
              might be some<br>
              issues with that (see <a class="moz-txt-link-freetext"
                href="https://github.com/joblib/joblib/issues/174">https://github.com/joblib/joblib/issues/174</a>).
              As a<br>
              test you might try to convert your notebook to a script
              and then run it<br>
              <br>
            </font></font><br>
          <br>
          <div class="moz-cite-prefix">On 23/11/2017 12:07, marco
            tettamanti wrote:<br>
          </div>
          <blockquote type="cite"
            cite="mid:77c2f8e5-6d2c-68c8-ef14-a430601be7a0@gmail.com"> <font
              size="-1"><font face="Arial">Dear Matteo (and others),<br>
                sorry, I am again asking for your help!<br>
                <br>
                I have experimented with the analysis of my dataset
                using an adaptation of your joblib-based gist.<br>
                As I wrote before, it works perfectly, but not with some
                classifiers: SVM classifiers always cause the code to
                terminate with an error.<br>
                <br>
                If I set:<br>
                        myclassif=clfswh['!gnpp','!skl','svm']    #Note
                that 'gnnp' and 'skl' were excluded for independent
                reasons<br>
                the code runs through without errors.<br>
                <br>
                However, with:<br>
                        myclassif=clfswh['!gnpp','!skl']<br>
                I get the following error:<br>
                        MaybeEncodingError: Error sending result:
                '[TransferMeasure(measure=SVM(svm_impl='C_SVC',
                kernel=LinearLSKernel(), weight=[], probability=1,<br>
                         weight_label=[]),
                splitter=Splitter(space='partitions'),
                postproc=BinaryFxNode(space='targets'),
                enable_ca=['stats'])]'. Reason: 'TypeError("can't<br>
                        pickle SwigPyObject objects",)'<br>
                <br>
                After googling for what may cause this particular error,
                I have found that the situation improves slightly (i.e.
                more splits executed, sometimes even all splits) by
                importing the following:<br>
                        import os<br>
                        from sklearn.externals.joblib import Parallel,
                delayed<br>
                        from sklearn.externals.joblib.parallel import
                parallel_backend<br>
                and then specifying just before 'Parallel(n_jobs=2)': <br>
                        with parallel_backend('threading'):<br>
                However, also in this case, the code invariably
                terminates with a long error message (I only report an
                extract, but in case I can send the whole error
                message):<br>
                        <type 'str'>: (<type
                'exceptions.UnicodeEncodeError'>,
                UnicodeEncodeError('ascii',<br>
                      
u'JoblibAttributeError\n___________________________________________________________________________\nMultiprocessing<br>
                      
exception:\n...........................................................................\n/usr/lib/python2.7/runpy.py
                in<br>
                      
                _run_module_as_main(mod_name=\'ipykernel_launcher\',
                alter_argv=1)\n    169     pkg_name =
                mod_name.rpartition(\'.\')[0]\n    170<br>
                       main_globals =
                sys.modules["__main__"].__dict__\n    171     if
                alter_argv:\n    172         sys.argv[0] = fname\n   
                173     return _run_code(code,<br>
                       main_globals, None,\n--> 174<br>
                <br>
              </font></font><br>
            I think I have sort of understood that the problem is due to
            some failure in pickling the parallelized jobs, but I have
            no clues if and how it can be solved.<br>
            Do you have any suggestions?<br>
            <br>
            Thank you and very best wishes,<br>
            Marco<br>
            <br>
            p.s. This is again the full code:<br>
            <br>
            ########## * ##########<br>
            ##########<br>
            <br>
            PyMVPA:<br>
             Version:       2.6.3<br>
             Hash:          9c07e8827819aaa79ff15d2db10c420a876d7785<br>
             Path:         
            /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc<br>
             Version control (GIT):<br>
             GIT information could not be obtained due
            "/usr/lib/python2.7/dist-packages/mvpa2/.. is not under GIT"<br>
            SYSTEM:<br>
             OS:            posix Linux 4.13.0-1-amd64 #1 SMP Debian
            4.13.4-2 (2017-10-15)<br>
            <br>
            <br>
            print fds.summary()<br>
            Dataset: 36x534@float32, <sa:
            chunks,targets,time_coords,time_indices>, <fa:
            voxel_indices>, <a:
            imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim><br>
            stats: mean=0.548448 std=1.40906 var=1.98546 min=-5.41163
            max=9.88639<br>
            No details due to large number of targets or chunks.
            Increase maxc and maxt if desired<br>
            Summary for targets across chunks<br>
              targets mean std min max #chunks<br>
                C      0.5 0.5  0   1     18<br>
                D      0.5 0.5  0   1     18<br>
            <br>
            <br>
            #Evaluate prevalent best classifier with nested
            crossvalidation<br>
            verbose.level = 5<br>
            <br>
            partitionerCD = ChainNode([NFoldPartitioner(cvtype=2,
            attr='chunks'), Sifter([('partitions', 2), ('targets', ['C',
            'D'])])], space='partitions')<br>
            # training partitions<br>
            for fds_ in partitionerCD.generate(fds):     <br>
                training = fds[fds_.sa.partitions == 1]<br>
                #print list(zip(training.sa.chunks,
            training.sa.targets))<br>
            # testing partitions<br>
            for fds_ in partitionerCD.generate(fds):     <br>
                testing = fds[fds_.sa.partitions == 2]<br>
                #print list(zip(testing.sa.chunks, testing.sa.targets))<br>
            <br>
            #Helper function (partitionerCD recursively acting on
            dstrain, rather than on fds):<br>
            def select_best_clf(dstrain_, clfs):<br>
                """Select best model according to CVTE<br>
                Helper function which we will use twice -- once for
            proper nested<br>
                cross-validation, and once to see how big an optimistic
            bias due<br>
                to model selection could be if we simply provide an
            entire dataset.<br>
                Parameters<br>
                ----------<br>
                dstrain_ : Dataset<br>
                clfs : list of Classifiers<br>
                  Which classifiers to explore<br>
                Returns<br>
                -------<br>
                best_clf, best_error<br>
                """<br>
                best_error = None<br>
                for clf in clfs:<br>
                    cv = CrossValidation(clf, partitionerCD)<br>
                    # unfortunately we don't have ability to reassign
            clf atm<br>
                    # cv.transerror.clf = clf<br>
                    try:<br>
                        error = np.mean(cv(dstrain_))<br>
                    except LearnerError, e:<br>
                        # skip the classifier if data was not
            appropriate and it<br>
                        # failed to learn/predict at all<br>
                        continue<br>
                    if best_error is None or error < best_error:<br>
                        best_clf = clf<br>
                        best_error = error<br>
                    verbose(4, "Classifier %s cv error=%.2f" %
            (clf.descr, error))<br>
                verbose(3, "Selected the best out of %i classifiers %s
            with error %.2f"<br>
                        % (len(clfs), best_clf.descr, best_error))<br>
                return best_clf, best_error<br>
            <br>
            # This function will run all classifiers for one single
            partitions<br>
            myclassif=clfswh['!gnpp','!skl'][5:6]  #Testing a single SVM
            classifier<br>
            def _run_one_partition(isplit, partitions,
            classifiers=myclassif): #see §§<br>
                verbose(2, "Processing split #%i" % isplit)<br>
                dstrain, dstest = list(splitter.generate(partitions))<br>
                best_clf, best_error = select_best_clf(dstrain,
            classifiers)<br>
                # now that we have the best classifier, lets assess its
            transfer<br>
                # to the testing dataset while training on entire
            training<br>
                tm = TransferMeasure(best_clf,
            splitter,postproc=BinaryFxNode(mean_mismatch_error,space='targets'),
            enable_ca=['stats'])<br>
                tm(partitions)<br>
                return tm<br>
            <br>
            #import os<br>
            #from sklearn.externals.joblib import Parallel, delayed<br>
            #from sklearn.externals.joblib.parallel import
            parallel_backend<br>
            <br>
            # Parallel estimate error using nested CV for model
            selection<br>
            confusion = ConfusionMatrix()<br>
            verbose(1, "Estimating error using nested CV for model
            selection")<br>
            partitioner = partitionerCD<br>
            splitter = Splitter('partitions')<br>
            # Here we are using joblib Parallel to parallelize each
            partition<br>
            # Set n_jobs to the number of available cores (or how many
            you want to use)<br>
            #with parallel_backend('threading'):<br>
            #    tms =
            Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit,
            partitions) <br>
            tms = Parallel(n_jobs=2)(delayed(_run_one_partition)(isplit,
            partitions)<br>
                                     for isplit, partitions in
            enumerate(partitionerCD.generate(fds)))<br>
            # Parallel retuns a list with the results of each parallel
            loop, so we need to<br>
            # unravel it to get the confusion matrix<br>
            best_clfs = {}<br>
            for tm in tms:<br>
                confusion += tm.ca.stats<br>
                best_clfs[tm.measure.descr] =
            best_clfs.get(tm.measure.descr, 0) + 1<br>
            <br>
            ##########<br>
            ########## * ##########<br>
            <br>
            <br>
            <br>
            <br>
            <br>
            <br>
            <br>
            <div class="moz-cite-prefix">On 13/11/2017 09:12, marco
              tettamanti wrote:<br>
            </div>
            <blockquote type="cite"
              cite="mid:b63d0b4c-525b-b1d0-54b5-45c765778976@gmail.com">
              <font size="-1"><font face="Arial">Dear Matteo,<br>
                  grazie mille, this is precisely the kind of thing I
                  was looking for: it works like charm!<br>
                  Ciao,<br>
                  Marco<br>
                  <br>
                </font></font>
              <blockquote type="cite"><font size="-1"><font face="Arial">On
                    11/11/2017 21:44, Matteo Visconti di Oleggio
                    Castello wrote:<br>
                    <br>
                    Hi Marco,<br>
                    <br>
                    in your case, I would then recommend looking into
                    joblib to parallelize<br>
                    your for loops (<a class="moz-txt-link-freetext"
                      href="https://pythonhosted.org/joblib/parallel.html">https://pythonhosted.org/joblib/parallel.html</a>).<br>
                    <br>
                    As an example, here's a gist containing part of the
                    PyMVPA's nested_cv<br>
                    example where I parallelized the loop across
                    partitions. I feel this is<br>
                    what you might want to do in your case, since you
                    have a lot more folds.<br>
                    <br>
                    Here's the gist:<br>
                    <a class="moz-txt-link-freetext"
                      href="https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0">https://gist.github.com/mvdoc/0c2574079dfde78ea649e7dc0a3feab0</a><br>
                  </font></font><br>
                <br>
                <div class="moz-cite-prefix">On 10/11/2017 21:13, marco
                  tettamanti wrote:<br>
                </div>
                <blockquote type="cite"
                  cite="mid:f898a361-4187-9f95-23f2-d9cb994cace9@gmail.com">
                  Dear Matteo,<br>
                  thank you for the willingness to look into my code.<br>
                  <br>
                  This is taken almost verbatim from <a
                    class="moz-txt-link-freetext"
                    href="http://dev.pymvpa.org/examples/nested_cv.html">http://dev.pymvpa.org/examples/nested_cv.html</a>,
                  except for the leave-one-pair-out partitioning, and a
                  slight reduction in the number of classifiers (in the
                  original example, they are around 45).<br>
                  <br>
                  Any help or suggestion would be greatly appreciated!<br>
                  All the best,<br>
                  Marco<br>
                  <br>
                  <br>
                  ########## * ##########<br>
                  ##########<br>
                  <br>
                  PyMVPA:<br>
                   Version:       2.6.3<br>
                   Hash:         
                  9c07e8827819aaa79ff15d2db10c420a876d7785<br>
                   Path:         
                  /usr/lib/python2.7/dist-packages/mvpa2/__init__.pyc<br>
                   Version control (GIT):<br>
                   GIT information could not be obtained due
                  "/usr/lib/python2.7/dist-packages/mvpa2/.. is not
                  under GIT"<br>
                  SYSTEM:<br>
                   OS:            posix Linux 4.13.0-1-amd64 #1 SMP
                  Debian 4.13.4-2 (2017-10-15)<br>
                  <br>
                  <br>
                  print fds.summary()<br>
                  Dataset: 36x534@float32, <sa:
                  chunks,targets,time_coords,time_indices>, <fa:
                  voxel_indices>, <a:
                  imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim><br>
                  stats: mean=0.548448 std=1.40906 var=1.98546
                  min=-5.41163 max=9.88639<br>
                  No details due to large number of targets or chunks.
                  Increase maxc and maxt if desired<br>
                  Summary for targets across chunks<br>
                    targets mean std min max #chunks<br>
                      C      0.5 0.5  0   1     18<br>
                      D      0.5 0.5  0   1     18<br>
                  <br>
                  <br>
                  #Evaluate prevalent best classifier with nested
                  crossvalidation<br>
                  verbose.level = 5<br>
                  <br>
                  partitionerCD = ChainNode([NFoldPartitioner(cvtype=2,
                  attr='chunks'), Sifter([('partitions', 2), ('targets',
                  ['C', 'D'])])], space='partitions')<br>
                  # training partitions<br>
                  for fds_ in partitionerCD.generate(fds):      <br>
                      training = fds[fds_.sa.partitions == 1]<br>
                      #print list(zip(training.sa.chunks,
                  training.sa.targets))<br>
                  # testing partitions<br>
                  for fds_ in partitionerCD.generate(fds):      <br>
                      testing = fds[fds_.sa.partitions == 2]<br>
                      #print list(zip(testing.sa.chunks,
                  testing.sa.targets))<br>
                  <br>
                  #Helper function (partitionerCD recursively acting on
                  dstrain, rather than on fds):<br>
                  def select_best_clf(dstrain_, clfs):<br>
                      """Select best model according to CVTE<br>
                      Helper function which we will use twice -- once
                  for proper nested<br>
                      cross-validation, and once to see how big an
                  optimistic bias due<br>
                      to model selection could be if we simply provide
                  an entire dataset.<br>
                      Parameters<br>
                      ----------<br>
                      dstrain_ : Dataset<br>
                      clfs : list of Classifiers<br>
                        Which classifiers to explore<br>
                      Returns<br>
                      -------<br>
                      best_clf, best_error<br>
                      """<br>
                      best_error = None<br>
                      for clf in clfs:<br>
                          cv = CrossValidation(clf, partitionerCD)<br>
                          # unfortunately we don't have ability to
                  reassign clf atm<br>
                          # cv.transerror.clf = clf<br>
                          try:<br>
                              error = np.mean(cv(dstrain_))<br>
                          except LearnerError, e:<br>
                              # skip the classifier if data was not
                  appropriate and it<br>
                              # failed to learn/predict at all<br>
                              continue<br>
                          if best_error is None or error <
                  best_error:<br>
                              best_clf = clf<br>
                              best_error = error<br>
                          verbose(4, "Classifier %s cv error=%.2f" %
                  (clf.descr, error))<br>
                      verbose(3, "Selected the best out of %i
                  classifiers %s with error %.2f"<br>
                              % (len(clfs), best_clf.descr, best_error))<br>
                      return best_clf, best_error<br>
                  <br>
                  #Estimate error using nested CV for model selection:<br>
                  best_clfs = {}<br>
                  confusion = ConfusionMatrix()<br>
                  verbose(1, "Estimating error using nested CV for model
                  selection")<br>
                  partitioner = partitionerCD<br>
                  splitter = Splitter('partitions')<br>
                  for isplit, partitions in
                  enumerate(partitionerCD.generate(fds)):<br>
                      verbose(2, "Processing split #%i" % isplit)<br>
                      dstrain, dstest =
                  list(splitter.generate(partitions))<br>
                      best_clf, best_error = select_best_clf(dstrain,
                  clfswh['!gnpp','!skl'])<br>
                      best_clfs[best_clf.descr] =
                  best_clfs.get(best_clf.descr, 0) + 1<br>
                      # now that we have the best classifier, lets
                  assess its transfer<br>
                      # to the testing dataset while training on entire
                  training<br>
                      tm = TransferMeasure(best_clf, splitter,<br>
                                          
                  postproc=BinaryFxNode(mean_mismatch_error,
                  space='targets'), enable_ca=['stats'])<br>
                      tm(partitions)<br>
                      confusion += tm.ca.stats<br>
                  <br>
                  ##########<br>
                  ########## * ##########<br>
                  <br>
                  <br>
                  <br>
                  <br>
                  <br>
                  <br>
                  <blockquote type="cite"><font size="-1"><font
                        face="Arial">On 10/11/2017 15:43, Matteo
                        Visconti di Oleggio Castello wrote:<br>
                          <br>
                        What do you mean with "cycling over approx 40
                        different classifiers"? Are<br>
                        you testing different classifiers? If that's the
                        case, a possibility is to<br>
                        create a script that takes as argument the type
                        of classifiers and runs the<br>
                        classification across all folds. In that way you
                        can submit 40 jobs and<br>
                        parallelize across classifiers.<br>
                        <br>
                        If that's not the case, because the folds are
                        independent and deterministic<br>
                        I would create a script that performs the
                        classification on blocks of folds<br>
                        (say fold 1 to 30, 31, to 60, etc...), and then
                        submit different jobs, so<br>
                        to parallelize there.<br>
                        <br>
                        I think that if you send a snippet of the code
                        you're using it can be more<br>
                        evident which are good points for
                        parallelization.<br>
                      </font></font><br>
                    <br>
                    <div class="moz-cite-prefix">On 10/11/2017 09:57,
                      marco tettamanti wrote:<br>
                    </div>
                    <blockquote type="cite"
                      cite="mid:9a2fb439-554c-360c-6aef-4c0693dc1a1a@gmail.com">
                      <pre wrap="">Dear Matteo and Nick,
thank you for your responses.
I take the occasion to ask some follow-up questions, because I am struggling to 
make pymvpa2 computations faster and more efficient.

I often find myself in the situation of giving up with a particular analysis, 
because it is going to take far more time that I can bear (weeks, months!). This 
happens particularly with searchlight permutation testing (gnbsearchlight is 
much faster, but does not support pprocess), and nested cross-validation.
As for the latter, for example, I recently wanted to run nested cross-validation 
in a sample of 18 patients and 18 controls (1 image x subject), training the 
classifiers to discriminate patients from controls in a leave-one-pair-out 
partitioning scheme. This yields 18*18=324 folds. For a small ROI of 36 voxels, 
cycling over approx 40 different classifiers takes about 2 hours for each fold 
on a decent PowerEdge T430 Dell server with 128GB RAM. This means approx. 27 
days for all 324 folds!
The same server is equipped with 32 CPUs. With full parallelization, the same 
analysis may be completed in less than one day. This is the reason of my 
interest and questions about parallelization.

Is there anything that you experts do in such situations to speed up or make the 
computation more efficient?

Thank you again and best wishes,
Marco


</pre>
                      <blockquote type="cite">
                        <pre wrap="">On 10/11/2017 10:07, Nick Oosterhof wrote:

There have been some plans / minor attempts for using parallelisation more
parallel, but as far as I know we only support pprocces, and only for (1)
searchlight; (2) surface-based voxel selection; and (3) hyperalignment. I
do remember that parallelisation of other functions was challenging due to
some getting the conditional attributes set right, but this is long time
ago.

</pre>
                        <blockquote type="cite">
                          <pre wrap="">On 09/11/2017 18:35, Matteo Visconti di Oleggio Castello wrote:

Hi Marco,
AFAIK, there is no support for parallelization at the level of
cross-validation. Usually for a small ROI (such a searchlight) and with
standard CV schemes, the process is quite fast, and the bottleneck is
really the number of searchlights to be computed (for which parallelization
exists).

In my experience, we tend to parallelize at the level of individual
participants; for example we might set up a searchlight analysis with
however n_procs you can have, and then submit one such job for every
participant to a cluster (using either torque or condor).

HTH,
Matteo

On 09/11/2017 10:08, marco tettamanti wrote:
</pre>
                          <blockquote type="cite">
                            <pre wrap="">Dear all,
forgive me if this has already been asked in the past, but I was wondering
whether there has been any development meanwhile.

Are there any chances that one can generally apply parallel computing (multiple
CPUs or clusters) with pymvpa2, in addition to what is already implemented for
searchlight (pprocess)? That is, also for general cross-validation, nested
cross-validation, permutation testing, RFE, etc.?

Has anyone had succesful experience with parallelization schemes such as
ipyparallel, condor or else?

Thank you and best wishes!
Marco

</pre>
                          </blockquote>
                        </blockquote>
                      </blockquote>
                    </blockquote>
                    <br>
                    <pre class="moz-signature" cols="80">-- 
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
IRCCS San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: <a class="moz-txt-link-abbreviated" href="mailto:tettamanti.marco@hsr.it">tettamanti.marco@hsr.it</a>
Skype: mtettamanti
<a class="moz-txt-link-freetext" href="http://scholar.google.it/citations?user=x4qQl4AAAAAJ">http://scholar.google.it/citations?user=x4qQl4AAAAAJ</a></pre>
                  </blockquote>
                  <br>
                </blockquote>
              </blockquote>
              <br>
            </blockquote>
          </blockquote>
        </blockquote>
        <br>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>