[pymvpa] PCA transformation prior to SVM classification

Thu Nov 25 15:56:17 UTC 2010

Hi all,
I'm using pymvpa to classify EEG data by SVM. in order to improve
accuracy, i am looking at transformations (such as time-frequency
decomposition) on the data prior to feeding it to the classifier.
i stumbled upon some methods mostly used in the BCI domain, such as
PCA, CSP (common spatial patterns), DSP (discriminative spatial
patterns) and the like.
i now have 2 questions:

1. running PCA, CSP, etc on the whole dataset _prior_ feeding it to a
classifer: looks to me as a case of 'double-dipping', as all trials
(training and test) are used to identifiy the components. thus all
trials in the dataset given to the classifier are actually
inter-dependent. am i right there?

2. if 1. is true, then one could still run the PCA (,etc) just on the
training set in each split*, and then run a SVM. does this make any
sense, or is a suited svm-kernel already taking care of this?

thanks for any comments or thoughts on this,
greetings, jakob

*something like:
clf = MappedClassifier(LinearCSVMC(), PCAMapper())

cv = CrossValidatedTransferError(
TransferError(clf),
NFoldSplitter(),
enable_ca=[’results’])
cv(dataset)