[pymvpa] NFoldSplitter question

Wed Mar 4 13:54:57 UTC 2009

Dear all,

We are playing with NFoldSplitter. Look at this:
----
import mvpa.suite as M
import numpy as N
from scipy.misc.common import comb

if __name__ == '__main__':

   n = 5 # number of samples
   f = 2 # number of features

   k = 3 # cvtype

   labels = N.arange(n) # labels
   d = M.Dataset(samples=N.arange(n).repeat(f).reshape(n,f), 
labels=labels) # dataset

   splitter = M.NFoldSplitter(cvtype=k, nrunspersplit=1, permute=False, 
count=None)
   split = splitter(d)

   # Enumerate and count splits
   i = 0
   for train, test in split:
       i +=1
       print train.samples[:,0], test.samples[:,0]
       pass
   print "Splitter generated",i,"splits"
   print "expected:",comb(n, k, exact=1)
-----
We get this output:
----
[3 4] [0 1 2]
[2 4] [0 1 3]
[2 3] [0 1 4]
[1 4] [0 2 3]
[1 3] [0 2 4]
[1 2] [0 3 4]
Splitter generated 6 splits
expected: 10
----
Why is the number of splits lower than the expected number of
all combinations? (should be equal according to the docstring)
Note that we are using latest tarballs of Numpy, PyMVPA etc.

Thanks,

Susanne
Emanuele