[pymvpa] RFE problem w/ Multi-Class SVM classifier

Sat Nov 21 02:11:20 UTC 2009

Hello,

I'd like perform some feature selection (using Recursive Feature  
Elimination) on a data set I'm analyzing, but I haven't been able to  
make it work.

I could not find any full example of how to use (rather than just  
create and/or train) a FeatureSelectionClassifier; I think a full  
example would be useful. The one example in the documentation showing  
how to train a FeatureSelectionClassifier did it by calling

clf.train(dataset)

... and then calling dataset.selectFeatures(clf.feature_ids)

This didn't work for me (see the code and errors below). I was working  
with a different classifier (linear SVM multi-class instead of kNN),  
and I was working with a slightly different data set (masked data set  
loaded from a Matlab matrix), but it seems that the same principles  
should apply. What am I doing wrong?

I suspect my problem may have something to do with the (bug?) that I  
wrote to you about previously (http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2009q4/000806.html 
)

To review, the function clf.getSensitivityAnalyzer(), rather than  
combining feature sensitivities across comparisons of the data (this  
is a multi-class classifier), was combining across features. Thus I  
got 3 sensitivity values (for the comparisons of 1vs2, 1vs3, and 2vs3)  
rather than 649 values (1 per feature (voxel) in my data set). I was  
able to read out the feature sensitivities by calling

clf.getSensitivityAnalyzer(transformer=None,combiner=None),

but now it seems like the RFE algorithm needs a correct combiner to  
work. I could not find any documentation on other arguments to provide  
besides "None" (combiner=??).

Help? Any idea what's going on?

The code I'm using and the error messages I get are provided below.

Thanks (again) for your time,

Mark

~~~~~~~~~~~~~

from scipy.io import loadmat
from mvpa.suite import *

DatFile = 'WholeBrainMatFile.mat' # 4-D .mat file of 2x2x2 voxels -  
440x80x60x69
MaskFile = 'ROI_Mask.mat' # Contains a mask for 649 voxels in the  
Lateral Occipital area
AttrFile = 'ConditionLabels.txt'

D = loadmat(DatFile)
Data = D['Data']
M = loadmat(MaskFile)
MaskMat = M['Mask']
attr = SampleAttributes(AttrFile)

# create masked data set
PyDat =  
MaskedDataset 
(samples=Data,labels=attr.labels,chunks=attr.chunks,mask=MaskMat)
zscore(PyDat,perchunk=True,targetdtype='float32')
# PyDat is: <Dataset / float32 440 x 649 uniq: 8 chunks 3 labels>

# Now: feature selection:

splitter = NFoldSplitter(cvtype=1)
rfesvm_split = SplitClassifier(LinearCSVMC(),splitter)
FtSelClf = FeatureSelectionClassifier(
	# use a linear SVM classifier:
	clf = LinearCSVMC(),
	# on features selected via RFE
	feature_selection = RFE(
		# based on sensitivity of a clf which does splitting internally
		sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(),  
#transformer=None	
		transfer_error=ConfusionBasedError(
			rfesvm_split,
			confusion_state="confusion"),
		# and whose internal error we use
		feature_selector=FractionTailSelector(
			0.2, mode='discard', tail='lower'),
		# remove 20% of features at each step
		enable_states=['feature_ids'],
		# update sensitivity at each step
		update_sensitivity=True),
	descr='LinSVM+RFE(splits_avg)')

# Option 1: simple training and check on feature IDs
print FtSelClf.trained # prints "False"
FtSelClf.train(PyDat)
print FtSelClf.trained # prints "True"
print FtSelClf.feature_ids
# (Generates error - see below)

# Option 2: Run cross-validated transfer error
terr = TransferError(FtSelClf)
splitter = NFoldSplitter(cvtype=1)
cvterr = CrossValidatedTransferError(
	terr,
	splitter)
Err = cvterr(PyDat)
print Err
# (Also generates error - having NOT run option 1)

To be clear - I only used EITHER Option 1 or Option 2 (one or the  
other was always commented out when I ran the code).

Option 1 gives the error:

Traceback (most recent call last):
   File "./FeatureSelection_Example.py", line 77, in <module>
     print FtSelClf.feature_ids
   File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py",  
line 1099, in __getattribute__
     return collections[known_attribs[index]].getvalue(index)
   File "/opt/local/lib/python2.5/site-packages/mvpa/misc/state.py",  
line 353, in getvalue
     return self._items[index].value
   File "/opt/local/lib/python2.5/site-packages/mvpa/misc/ 
attributes.py", line 66, in _getVirtual
     return self._get()
   File "/opt/local/lib/python2.5/site-packages/mvpa/misc/ 
attributes.py", line 227, in _get
     raise UnknownStateError("Unknown yet value of %s" % (self.name))
mvpa.misc.exceptions.UnknownStateError: Exception: Unknown yet value  
of feature_ids

And Option 2 gives the error:

Traceback (most recent call last):
   File "./FeatureSelection_Example.py", line 81, in <module>
     Err = cvterr(PyDat)
   File "/opt/local/lib/python2.5/site-packages/mvpa/measures/ 
base.py", line 105, in __call__
     result = self._call(dataset)
   File "/opt/local/lib/python2.5/site-packages/mvpa/algorithms/ 
cvtranserror.py", line 173, in _call
     result = transerror(split[1], split[0])
   File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/ 
transerror.py", line 1283, in __call__
     self._precall(testdataset, trainingdataset)
   File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/ 
transerror.py", line 1239, in _precall
     self.__clf.train(trainingdataset)
   File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/base.py",  
line 354, in train
     result = self._train(dataset)
   File "/opt/local/lib/python2.5/site-packages/mvpa/clfs/meta.py",  
line 1058, in _train
     self.__testdataset)
   File "/opt/local/lib/python2.5/site-packages/mvpa/featsel/rfe.py",  
line 268, in __call__
     wdataset = wdataset.selectFeatures(selected_ids)
   File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/ 
mapped.py", line 130, in selectFeatures
     sdata = Dataset.selectFeatures(self, ids=ids, sort=sort)
   File "/opt/local/lib/python2.5/site-packages/mvpa/datasets/ 
base.py", line 1018, in selectFeatures
     new_data['samples'] = self._data['samples'][:, ids]
IndexError: index (2) out of range (0<=index<1) in dimension 1

~~~~~~~~~~~~~~~~~~~~~~~~~~

Mark Lescroart
(say it LESS-qua)

University of Southern California
Neuroscience Graduate Program
Image Understanding Lab
Email: mark.lescroart at usc.edu
Cell: (213) 447-0752

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20091120/139ddafe/attachment.htm>