[pymvpa] Q: Time-offset of category labels in Haxby 2001 dataset. Was this ever properly resolved?

Sun Mar 6 16:16:32 UTC 2011

Dear PyMVPA folks,

A while ago on this list there was some discussion
of the question of whether the category-labels
attached to each TR in the Haxby 2001 data-set:
http://dev.pymvpa.org/datadb/haxby2001.html
correspond to either:
1. The times at which the stimuli were presented to the subjects
or
2. The times at which the subjects' HRFs are showing responses to those stimuli.

Here's the final element in that discussion thread:
http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2009q1/000321.html

Was this question ever properly resolved?

I am planning to submit a paper with some new analyses of this dataset,
so I want to make extra-sure that I am using the right labels.

Comparing the dataset with the methods-description
in the 2001 Science paper itself, here's what I can glean so far:

The data has a TR of 2.5s,
with 8 stimulus-blocks of 24s each, and 9 rest periods of 12s each
(one rest period after each block, and another one at the start of each run).

That makes each run last 300s:
Blocks = 8*24 = 192s
Rest = 9*12 = 108s

300secs is 120 TRs:
300 / 2.5 = 120

However, in the dataset at
http://data.pymvpa.org/datasets/haxby2001
each run has 121 TRs.

In each of those runs, the first 6 TRs are labeled as rest,
which makes a 6*2.5 = 15secs long rest period.
However, as mentioned above, the 2001 paper says that
each rest period, including the one at the start of each run, was 12s.
12 seconds would actually be 4.8 TRs.

This suggests to me (but it certainly doesn't guarantee it!)
that the downloadable data in effect have one rest-period TR
pre-attached at the beginning of each run,
making, in effect, a time-offset of 2.5s.

When taking into account the HRF-delay in assigning category-labels to TRs,
I personally code it such that a category gets listed as occurring when the
HRF-convolved stimulus-time-series for that category is above its
run-mean value.

For a TR of 2.5s, this ends up in effect being an offset of one TR.

So, my current guess is that the fact that the downloadable Haxby data
appear to have one extra label of rest inserted at the start of each run
means that, in effect, the labels already have the correct offsets
for corresponding to evoked BOLD responses, rather than raw stimulus-onsets.

I am not sure about this, though, and would love to hear other
people's thoughts.
In particular, one key question is left unaddressed:
the 2001 Science paper says that each run had 120 TRs in it,
but the downloadable data has 121 TRs in each run.
Where did the extra TR's worth of data come from?
The best-case scenario would be that the first TR in each run,
which gets assigned the label "rest" and which in effect offsets all
the label-times,
is a dummy-scan volume that was collected 2.5s before the
moment that counted as t=0 for the stimulus-presentation timings.
However, I have no idea if that is the case or not.

Any help greatly appreciated.

Raj