[Pkg-exppsy-pymvpa-dev] Question about new fmri_dataset

Tue Dec 29 13:07:59 UTC 2009

* Yaroslav Halchenko <yoh at dartmouth.edu> [091229]:
> Hey -- wrote comparison thingie (lazy me -- just did it because had to
> recall what is actually stored in those haynes results), so,

I think i wrote a similar 'thingy', which is in the analsysis script. I think a
way to compare nifti files should maybe go into pymvpa, since already two people
have needed it. (you and me! :) )

> 
> head4:/data/haynes_ts/results/original/results_dec_ind_2tasksets_full_fir_spm64_dev
> $> for i in {1..8}; do /data/haynes_ts/code/valentin/compare_res.py /data/haynes_ts/results/original/results_dec_ind_2tasksets_full_fir_spm64_dev/RES_r4_S1_DCG_BIN_${i}_M.hdr /data/haynes_ts/data/SRMAP_BETAS_nifti/AS1T/model_2tasksets_full_fir/timebin_${i}.nii.gz; done
> Converting data #0 from acc to errors
> Difference 13.140612 with norm of our 112.355532, max abs 0.375008. Correlation between results 0.90
> Converting data #0 from acc to errors
> Difference 12.915092 with norm of our 120.112187, max abs 0.375008. Correlation between results 0.93
> Converting data #0 from acc to errors
> Difference 12.766073 with norm of our 114.589962, max abs 0.375008. Correlation between results 0.89
> Converting data #0 from acc to errors
> Difference 12.905451 with norm of our 116.070426, max abs 0.375008. Correlation between results 0.88
> Converting data #0 from acc to errors
> Difference 12.313136 with norm of our 111.751468, max abs 0.375008. Correlation between results 0.85
> Converting data #0 from acc to errors
> Difference 12.485157 with norm of our 111.813251, max abs 0.500000. Correlation between results 0.84
> Converting data #0 from acc to errors
> Difference 12.928463 with norm of our 109.936985, max abs 0.375008. Correlation between results 0.85
> Converting data #0 from acc to errors
> Difference 13.213565 with norm of our 113.086388, max abs 0.375008. Correlation between results 0.84
> 
> So, they look very close but not an exact match... how close your results
> matched before? may be I am taking some other results for matlab? (see path
> above)

Looks like the right file

> Although altogether I would not be too surprised since convergence
> tollerance was low so, no guarantee that two SVMs with noisy data would perform
> exactly the same ;-)

Yeah, you could also adjust the epsilon value in your floating point comparison,
and see if that changes anything.

Cool stuff all the way!

V-

> On Tue, 29 Dec 2009, Valentin Haenel wrote:
> 
> > Sweet, thanks so much, i'll run it tomorrow. :-)
> 
> > V-
> 
> > * Yaroslav Halchenko <yoh at dartmouth.edu> [091229]:
> > > here is your script adjusted -- see attached
> 
> 
> > > On Mon, 28 Dec 2009, Valentin Haenel wrote:
> 
> > > > Hey,
> 
> > > > I am in the middle of rewriting my searchlight analysis script to use the new
> > > > fmri_dataset.
> 
> > > > how would i construct a new fmri_dataset using the new factory method, including
> > > > a flatten mapper and a name for the input space?
> 
> > > > V-
> 
> > > > _______________________________________________
> > > > Pkg-exppsy-pymvpa-dev mailing list
> > > > Pkg-exppsy-pymvpa-dev at lists.alioth.debian.org
> > > > http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa-dev
> 
> 
> > > -- 
> > > Yaroslav O. Halchenko
> > > Postdoctoral Fellow,   Department of Psychological and Brain Sciences
> > > Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> > > Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> > > WWW:   http://www.linkedin.com/in/yarik        
> 
> > > #!/usr/bin/env python
> > > #coding=utf-8
> 
> > > """ This is vals useful script for repeating the taksset decoding study
> 
> > >     it can be used from within ipython and a usual call would look like:
> 
> > >     import analysis
> > >     analysis.set_dataroot('SUBJECT','DEC_TYPE')
> > >     ds = analysis.load_fir_dataset(2)
> > >     sl_map = analysis.run_searchlight(ds)
> 
> > >     and then depending on how you want to look at it:
> 
> > >     masked_orig_sl_map = analysis.map_to_masked_array(ds, sl_map)
> > >     analysis.plot_volume(masked_orig_sl_map)
> 
> > >     or alternatively
> 
> > >     analysis.save_back_nifti(ds, sl_map, FILENAME)
> 
> > > """
> 
> 
> > > import os, operator, numpy, time
> 
> > > from mvpa.datasets.mri import fmri_dataset
> > > from mvpa.datasets.splitters import NFoldSplitter
> > > from mvpa.clfs.svm import LinearCSVMC
> > > from mvpa.clfs.transerror import TransferError
> > > from mvpa.algorithms.cvtranserror import CrossValidatedTransferError
> > > from mvpa.measures.searchlight import sphere_searchlight
> 
> > > if __debug__:
> > > 	from mvpa.base import debug
> 
> > > from nifti.niftiimage import NiftiImage
> 
> > > subjects = ['AS1T', 'HSET', 'KSDT', 'MK4T', 'SI1T', 'TR1T', 'HF4T', 'JH1T', 'ME1T', 'RF1T', 'SU2T', 'ZK1T']
> 
> > > dec_types = {'tasksets':'model_2tasksets_full_fir', 'targets':'model_targets_fir'}
> 
> > > dataroot = None
> > > # this is set to the root of the experiment
> > > exp_root = '/data/haynes_ts/data/SRMAP_BETAS_nifti'
> 
> 
> 
> > > """ mapping of beta index to timebin 
> > >     each list in the list represents 1 timebin
> > >     within that list in the list we have the following encoding:
> > >         [cond A run 1, cond B run 1,
> > >          cond A run 2, cond B run 2,
> > >          cond A run 3, cond B run 3,
> > >          cond A run 4, cond B run 4]
> > > """
> > > beta_order = [[1, 9, 17, 25, 33, 41, 49, 57], \
> > >         [2, 10, 18, 26, 34, 42, 50, 58], \
> > >         [3, 11, 19, 27, 35, 43, 51, 59], \
> > >         [4, 12, 20, 28, 36, 44, 52, 60], \
> > >         [5, 13, 21, 29, 37, 45, 53, 61], \
> > >         [6, 14, 22, 30, 38, 46, 54, 62], \
> > >         [7, 15, 23, 31, 39, 47, 55, 63], \
> > >         [8, 16, 24, 32, 40, 48, 56, 64]]
> 
> > > def print_time(name, sec):
> > >     """ print seconds in terms of hours, minutes and secondd """
> > >     hours, remainder = divmod(sec, 3600)
> > >     minutes, seconds = divmod(remainder, 60)
> > >     print '%s took %d hours %d minutes and %d seconds' % (name, hours, minutes, seconds)
> 
> > > def print_timing(func):
> > >     """ decorator to estimate the timing of methods """
> > >     def wrapper(*arg, **kwargs):
> > >         t1 = time.time()
> > >         res = func(*arg , **kwargs)
> > >         t2 = time.time()
> > >         print_time(func.func_name, t2-t1)
> > >         return res
> > >     return wrapper
> 
> > > #enble debug output for searchlight call
> > > if __debug__:
> > >     debug.active += ["SLC"]
> 
> > > def idx_to_filename(index):
> > >     """ take the integer index of a beta and return a corresponding filename """
> > >     return "beta_" + str(index).zfill(4)
> 
> > > def cat_filename(filename):
> > >     """ concatenate root directory for the data and the filename """
> > >     return os.path.join(get_dataroot(), filename)
> 
> > > def set_dataroot(subject, dec_type):
> > >     """ concatenate exp_root, subject and dec_type together """
> > >     # first make sure all the stuff does exist
> > >     if not os.path.exists(exp_root):
> > >         raise ValueEror('Your exp_root does not exist: \n'+exp_root)
> > >     if subject not in subjects:
> > >         raise ValueError('Your subject: \"'+subject+'\" does not exist.\n'
> > >             +'possible values are: '+str(subjects))
> > >     if dec_type not in dec_types.keys():
> > >         raise ValueError('Your dec_type: \"'+dec_type+'\" does not exist\n'
> > >             +'possible values are: '+str(dec_types.keys()))
> > >     # then setup dataroot
> > >     global dataroot
> > >     dataroot = os.path.join(exp_root,subject,dec_types[dec_type])
> > >     print dataroot
> 
> > > def get_dataroot():
> > >     if dataroot is None:
> > >         raise ValueError('dataroot is still none, set it with set_dataroot')
> > >     return dataroot
> 
> > > @print_timing
> > > def load_fir_dataset(timebin):
> > >     """ load the dataset for a given timebin """
> 
> > >     if timebin not in range(1, 9):
> > >         raise ValueError("when loading an FIR dataset timebin must be between 1 and 8")
> 
> > >     # first we load the mask
> > >     # since it needs to be 42x64x64 (coming from 1x42x64x64) we need to do some 
> > >     # array acrobatics first
> 
> > >     print 'loading the mask from: ', cat_filename('mask')
> > >     m = NiftiImage(cat_filename('mask'))
> > >     print 'reshaping the mask'
> > >     m = m.asarray()[0]
> > >     print 'new shape is : ', m.shape
> 
> > >     print 'loading the dataset for timebin', timebin
> > >     print 'from: ' , dataroot
> 
> 
> > >     # convert from semantic notion of a timebin into an index for an array
> > >     timebin -= 1
> 
> > >     # convert the weird beta indexes into absolute filenames
> > >     beta_filenames = \
> > >     map(cat_filename, map(idx_to_filename, beta_order[timebin]))
> > >     label_list = [1,2,1,2,1,2,1,2]
> > >     chunk_list = [1,1,2,2,3,3,4,4]
> > >     dataset = fmri_dataset(samples=beta_filenames,
> > >                        labels=label_list,
> > >                        chunks=chunk_list,
> > >                        mask=m)
> > >     print dataset
> > >     return dataset
> 
> > > def compare_NiftiDataset(ds1,ds2):
> > >     sample_eq = (ds1.samples == ds2.samples).all()
> > >     label_eq  = (ds1.labels == ds2.labels).all()
> > >     chunks_eq = (ds1.chunks == ds2.chunks).all()
> > >     return sample_eq and label_eq and chunks_eq
> 
> > > def compare_NiftiImage(ni1, ni2, epsilon=1.0):
> > >     """ compare the voxels of two nifti images using approximate epsilon style
> > >         floating point comparison 
> > >     """
> > >     comp_array = abs(ni1.getScaledData() - ni2.getScaledData()) < epsilon
> > >     temp = [1 if i else 0 for i in comp_array.flatten()]
> > >     print 'comparing 2 nifti images'
> > >     print ni1.filename
> > >     print ni2.filename
> > >     print 'Fraction of exactly equal voxels: ', float(sum(temp))/len(temp)
> 
> > > @print_timing
> > > def run_searchlight(ds):
> > >     """ run our searchlight, radius 4 voxels and return the raw sl_map """
> > >     clf = LinearCSVMC(C=1, epsilon=0.001)
> > >     # i hope that the NFoldSplitter splits chunk wise
> > >     cv = CrossValidatedTransferError(TransferError(clf),
> > >                                   NFoldSplitter(cvtype=1))
> > >     # setup the searchlight with radius 4 voxles (i hope)
> > >     sl = sphere_searchlight(cv, radius=4, space='voxel_indices', nproc=8)
> > > 							#center_ids=range(10000))
> > >     print "aight tigger, ill run yer searchlight now"
> > >     # run the searchlight
> > >     sl_map = [sl(ds_.copy(deep=False, sa=['labels', 'chunks'], fa=['voxel_indices'], a=[]))
> > > 			  for ds_ in ds]#[:1]]
> > >     return sl_map
> 
> > > def map_to_masked_array(ds, sl_map):
> > >     """ map the sl_map back into a masked array for printin oslt """
> > >     orig_sl_map = ds.mapReverse(numpy.array(sl_map))
> > >     masked_orig_sl_map = numpy.ma.masked_array(orig_sl_map, mask=orig_sl_map == 0)
> > >     return masked_orig_sl_map
> 
> > > def make_normalized_accuracy_map(sl_map, chancelevel=50):
> > >     """ normalize the accuracy map to have values between -50 and 50, where 0
> > >         indicates chance level
> > >     """
> > >     return [i*100-chancelevel for i in sl_map]
> 
> > > def save_back_nifti(dataset, some_map, filename):
> > >     """ save an accuracy map back to nifti file for viewing with other"""
> > >     print "saving map in the original dataspace as: ", filename
> > >     ni = dataset.map2nifti(some_map)
> > >     ni.save(filename)
> 
> > > def plot_volume(vol):
> > >     """ extremely crude volume plotter """
> > >     # import pylab here, such that this code can be run on headless boxes
> > >     # provided this function isn't used
> > >     import pylab
> > >     id = 0
> > >     for i in xrange(6):
> > >         for j in xrange(7):
> > >             pylab.subplot(6, 7, id)
> > >             pylab.imshow(vol[id], interpolation='nearest',
> > >                     aspect=1.25, cmap=pylab.cm.autumn)
> > >             pylab.clim(0.5, 0.75)
> > >             pylab.colorbar(shrink=0.75)
> > >             id += 1
> > >     pylab.show()
> 
> > > @print_timing
> > > def single_timebin(timebin):
> > >     """ run the searchlight for a single subject for a single timebin """
> > >     ds = load_fir_dataset(timebin)
> > >     sl_map = run_searchlight(ds)
> > >     #sl_map_norm = make_normalized_accuracy_map(sl_map)
> > >     #save_back_nifti(ds,sl_map_norm,cat_filename('timebin_'+str(timebin)+'.nii.gz'))
> 
> > > @print_timing
> > > def single_subject():
> > >     """ run the searchlight for one subject, for one decoding, for ALL timebins
> > >         set subject and decoding with:
> 
> > >         analysis.set_dataroot('SUBJECT','DEC_TYPE')
> > >     """
> > >     for t in range(1,9):
> > >         single_timebin(t)
> 
> > > @print_timing
> > > def fourtytwo():
> > >     data = [load_fir_dataset(i) for i in range(1,9)]
> > >     sl_map = run_searchlight(data)
> > >     #for timebin in range(1,9):
> > >     #    save_back_nifti(data[0],sl_map[timebin-1],cat_filename('timebin_'+str(timebin)+'.nii.gz'))
> 
> > > def main():
> > >     set_dataroot('AS1T','tasksets')
> > >     #single_timebin(1)
> > >     fourtytwo()
> > >     #single_subject()
> 
> > > if __name__ == "__main__":
> > >     main()
> 
> 
> 
> 
> 
> -- 
> Yaroslav O. Halchenko
> Postdoctoral Fellow,   Department of Psychological and Brain Sciences
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik        
>