[pymvpa] High standart deviation by cross validation on fMRI data

Sun Nov 23 12:14:08 UTC 2014

Hello all,

I am trying to perform 2-class classification.
Each data sample is a full-brain scan (using whole brain mask).
The data was provided in AFNI format after motion and time slice correction.
I've converted those to nifti.

I'm performing the following using pymvpa library:
- Loading a nifti file
- Linear detrending the data
- Zscore regarding baseline labeled scans (i've tried chunk-wise and also
non-chunk wise)
- Perform feature selection using ANOVA (1000 features), using LinearCSVMC
classifier
- Cross validation over 4-folds.

The results conducted in each fold are close to chance level ~55%.
That would be ok, if the standart deviation was small. The standart
deviation turns out between 0.9-0.17. Meaning, some fold's results of the
4-fold cross validation, do well, others do poorly.

I don't get why it performs well in some folds - and then performs poorly
in others.
To focus the problem - i've also tried to do 2-fold cross validation for
high on a specific fold. It also gives a high standart deviation (the
results for each fold was far).

My questions are:
- Could anyone know (or maybe faced) this problem?
- How can i debug/understand why there is a hugh difference between each
run results?
- Could the problem be in the original data? if so -what could have gone
wrong?

Bellow, i've added the output of the pymvpa script i'm using.
In there, the 4 errors of each fold cross validation are printed.
Also, i'm printing ds.summary() (before detrending, after detrending and
after z-scoring) so it might be enlightening in some way.

Please help me,
Thank you,
Gal Star
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20141123/a1860746/attachment.html>
-------------- next part --------------
import getopt
import sys
import os
from mvpa2.suite import *
import numpy

type = sys.argv[1]
sub = sys.argv[2]

img_name = '4D_scans.nii.gz'
map_name = 'map.txt'
source 	 = '/home/gals/converted_data/sub_brik_data/' + type + '/' + sub + '/' 
#source  = '/home/gals/converted_data/sub_brik_data/ester/' + type
#source  = '/home/gals/converted_data/sub_brik_data/gal/'+ sub + '/'
#source  = '/home/gals/converted_data/sub_brik_data/gal/Dov/'

print "type: %s" % type
print "sub: %s" % sub 

#########################################################
# Read mvpa sample attributes definition from text file #
#########################################################
attr=SampleAttributes(os.path.join(source,map_name))
print "after sampleAttributes"

#fds=fmri_dataset(samples=os.path.join(source,img_name),targets=attr.targets,chunks=attr.chunks)
fds=fmri_dataset(samples=os.path.join(source,img_name),targets=attr.targets,chunks=attr.chunks,mask='/home/gals/masks/brain_mask.nii.gz')
#fds=fmri_dataset(samples=os.path.join(source,img_name),targets=attr.targets,chunks=attr.chunks,mask='/home/gals/converted_data/sub_brik_data/ester/for_gal/mask.nii.gz')

print "passed fmri dataset"
print "before detrending:"
print fds.summary()

poly_detrend(fds, polyord=1, chunks_attr='chunks')
print "after detrending:"
print fds.summary()

interesting = numpy.array([l in ['32','31','3'] for l in fds.sa.targets])
fds = fds[interesting]
zscore(fds, param_est=('targets', ['3']))
interesting = numpy.array([l in ['32','31'] for l in fds.sa.targets])
fds = fds[interesting]

#zscore(fds,chunks_attr=None, dtype='float64')
#print fds.summary()

print fds.summary()

#clf = LinearCSVMC()
clf = FeatureSelectionClassifier(LinearCSVMC(),SensitivityBasedFeatureSelection(OneWayAnova(), FixedNElementTailSelector(1000, tail='upper', mode='select')))
cv = CrossValidation(clf,NFoldPartitioner(),enable_ca=['stats'])

error = cv(fds)
accuracy = 1- np.mean(error)

print "The list of results per fold:" 
print error.samples

print "Additional stats:"
print cv.ca.stats.as_string(description=True)

print "And The Accuracy:"
print "Accuracy is %f" % accuracy

#fout = open(logger, "w")
#fout.write(str(accuracy))
#print "Error for %i-fold cross-validation on %i-class problem: %f" \
#% (len(fds.uniquechunks), len(fds.uniquelabels), error)
#fout.close()

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gal_result
Type: application/octet-stream
Size: 8532 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20141123/a1860746/attachment.obj>