EvLab fMRI preprocessing and analysis pipeline (evlab.mit.edu): a complete pipeline combining SPM and CONN functionality for task activation analyses of fMRI data

EL command-line quick-guide:

INITIALIZATION

Initialization and configuration routines

>> conn module EL init

EL module initialization

>> el remote on|off

(optional) connects/disconnects using SSH to remote server where data is stored (default: data is locally accessible) 

(note: use 'conn_remotely setup' in host to enable remote access, see Remote configuration for details)

>> el root.subjects [your_root_folder]

(optional) points to subjects folder (default: indicated by .../conn/modules/el/root.subjects symbolic link)

>> el root.pipelines [your_pipelines_folder]

(optional) points to preprocessing pipelines folder  (default:  .../conn/modules/el)

>> el root.tasks [your_task_folder]

(optional) points to analysis tasks folder  (default:  indicated by .../conn/modules/el/root.tasks symbolic link)

DATA PREPROCESSING

fMRI data import and preprocessing

>> el preprocessing [subject_ID]

Imports and preprocess data from one subject using the default preprocessing pipeline (e.g. DefaultMNI_PlusStructural)

>> el preprocessing [subject_ID] [pipeline_ID]

Imports and preprocess data from one subject using a custom preprocessing pipeline

>> el preprocessing.[main|alt] [subject_ID] [pipeline_ID]

Same as el preprocessing ...  but explicitly specifying whether the preprocessing output files are saved in the same folder as the input functional/anatomical files (preprocessing.main option, useful when running a fixed/single preprocessing pipeline on your data) or in a separate folder named after the specific pipeline_ID used (preprocessing.alt option, useful when running multiple/different preprocessing pipelines on the same dataset)

>> el preprocessing.append [subject_ID] [pipeline_ID_1] [pipeline_ID_2]

Runs additional preprocess steps (pipeline_ID_2) on an already preprocessed dataset (pipeline_ID_1). This allows editing/modifying an already preprocessed dataset

>> el preprocessing.qa [subject_ID] [pipeline_ID]

Creates Quality-Control plots evaluating the output of a preprocessing pipeline

>> el preprocessing.qa.plot [subject_ID] [pipeline_ID]

Displays previously-computed Quality-Control plots evaluating the output of a preprocessing pipeline

>> el select [subject_ID] [pipeline_ID]

>> [QC_vals, QC_names] = el('module', 'get', 'l2covariates')

Returns previously-computed Quality-Control variables evaluating the output of a preprocessing pipeline

>> el open [subject_ID] [pipeline_ID]

Displays subject's data in CONN gui

>> el submit preprocessing ...

Same as el preprocessing ...  but preprocessing steps are run by other nodes/computers within your HPC or cluster computer environment 

DATA ANALYSIS

fMRI data first-level model and contrast estimation

>> el model [subject_ID] [pipeline_ID] [model_ID]

Runs 1st-level model and contrast estimation

>> el model [subject_ID] [pipeline_ID] [model_ID] [options_ID]

Runs 1st-level model and contrast estimation using custom estimation options

>> el model.contrast [subject_ID] [pipeline_ID] [model_ID] [options_ID]

Runs 1st-level contrast estimation step only (for a previously-estimated 1st-level model)

>> el model.stats [subject_ID] [pipeline_ID] [model_ID]

Displays 1st-level contrast statistics

>> el model.plot [subject_ID] [pipeline_ID] [model_ID]

Displays 1st-level contrast effect-sizes

>> el model.qa [subject_ID] [pipeline_ID] [model_ID]

Creates Quality-Control plots evaluating the selected 1st-level analysis

>> el model.qa.plot [subject_ID] [pipeline_ID] [model_ID]

Displays previously-computed Quality-Control plots evaluating the selected 1st-level analysis

>> el submit model ...

Submits 1st-level analysis to be run by other nodes/computers within your HPC or cluster environment 


In Matlab type "help el" for additional details

EL data organization:

EL data and analyses are organized along three different axes: subjects, preprocessing pipelines, and models. 

Subjects

The functional data and analysis results are organized in a subject-centric manner, where all of the data associated with a given subject (or subject session) is organized under the same directory (even if different portions of these data may be used in different experiments) .  

The directory where all of the subjects data are stored is called EL's root.subjects directory. This directory is initialized as .../conn/modules/el/root.subjects/. This may be either manually transformed into a symbolic link to point somewhere else, or it may also be defined programmatically (e.g. when multiple usergroups share the same software installation) by using the syntax:

>> el root.subjects your_subjects_folder

Within this location, each subdirectory contains the data from a different subject. Subject directory names in this folder are referred to in EL as subject-ID's, and they need to be unique for each subject but are otherwise arbitrarily defined (without whitespaces or punctuation marks, e.g. valid subject IDs could be 835_FED_20200305a_3T2_PL2017, sub-LR01, or sub0001). All of the input and output data for an individual subject is located within this subject directory [root.subjects]/[subject_ID].

example contents of SUBJECTS folder before starting EL preprocessing and analyses (the original raw data folder in this example may be named or organized arbitrarily)

Preprocessing pipelines

Each individual subject's functional and anatomical data can be preprocessed using multiple different preprocessing pipelines. Different pipelines are identified in EL by a pipeline-ID. Pipeline IDs need to be unique for each pipeline but are otherwise arbitrarily defined (without whitespaces or punctuation marks, e.g. valid pipeline IDs could be DefaultMNI, surface-space, or pipeline01).

Preprocessing description: The description and configuration options of all individual pipelines are grouped in the root.pipelines directory. This directory by default is initialized as .../conn/modules/el/ which contains a number of standard and commonly used preprocessing pipelines. You may specify a different directory programmatically using the syntax:

>> el root.pipelines your_pipelines_folder

Each individual preprocessing pipeline in EL is fully defined by a single configuration file (e.g. in [root.pipelines]/pipeline_preproc_[pipeline_ID].cfg). In addition to the default pipelines in EL, users may manually edit these files or or create their own configuration files. Different pipelines or pipeline modules may also be applied sequentially. EL may use any combination of preprocessing steps implemented in CONN, including functional realignment, slice-timing correction, susceptibility distortion correction, outlier identification, functional/anatomical coregistration, MNI-space normalization, tissue class segmentation, masking, smoothing, etc.  including steps focused on denoising the functional data, such as aCompCor, scrubbing, band-pass filtering, etc. offering a lot of flexibility to preprocess your functional and anatomical data. See the section Preprocessing pipeline files below for details about these files.

Preprocessing inputs: the input files of each preprocessing pipeline are the raw/original functional data (optionally also anatomical data, fieldmaps, etc.) stored within each subject directory.

Preprocessing outputs: the output files of a preprocessing pipeline are named following SPM prefix convention (see https://web.conn-toolbox.org/tutorials#h.p_Pk74qAeMP6Ml for details), and they are stored within each individual subject directory, specifically:

a) primary preprocessing pipeline: when using a command of the form el preprocessing.main ... to preprocess your data, the output files will be stored in the same folder as the original anatomical/functional data (note: if using DICOM to NIFTI conversion as part of preprocessing, this folder will be the NIFTI output folder, named nii)

e.g. output files stored in [root.subjects]/[subject_ID]/nii

or b) secondary preprocessing pipelines: when using a command of the form el preprocessing.alt ... to preprocess your data, the output files will be stored within a separate subfolder for each preprocessing pipeline 

e.g. output files stored in [root.subjects]/[subject_ID]/[pipeline_ID]

The former option is useful mainly when a single/common preprocessing pipeline is going to be run on all datasets, while the latter option is useful when multiple / different pipelines may be run on the same dataset. Both options may be used in isolation or simultaneously, for example to keep the primary preprocessing pipeline outputs in the same folder as your original files, while keeping the outputs of secondary pipelines separately. When referring to an already preprocessed dataset within the EL framework, use the keyword main to refer to  the output of the primary preprocessing pipeline, or directly the [pipeline_ID]  name to refer to the output of any secondary preprocessing pipeline.

Models

Each preprocessed dataset can be analyzed using one or multiple different 1st/subject-level task activation models. Different models are identified in EL by a model-ID. Model IDs need to be unique for each model but are otherwise arbitrarily defined (without whitespaces or punctuation marks, e.g. valid model IDs could be LanglocSN, taskAB, or 139283). A separate 1st-level SPM analysis will be computed for each individual model-ID in order to estimate the model parameters (effects of interest such as task-related responses). 

Model description: the information about all individual model definition and configuration options are grouped in EL's root.tasks directory. By default this directory is initialized as .../conn/modules/el/root.tasks/. This may be either manually transformed into a symbolic link to point somewhere else, or it may also be defined programmatically using the syntax:

>> el root.tasks your_tasks_folder

Typically a model is fully defined by a combination of model-design files (.para), indicating the timing and sequence of events or blocks within each functional run, and model-association files (.cat), indicating the association between these experimental designs and the functional runs for each subject (see model-design files and model-association files sections below for details). In addition, a model may also specify details about the desired model-estimation options and the desired contrasts to be estimated by SPM (see model-estimation files and model-contrast files below for details)

Model outputs: First-level analysis outputs include all of the standard SPM analysis output files, such as an SPM.mat file containing the details of the estimated General Linear Model, beta_*.nii files containing maps of estimated effect-sizes for each model regressor, and con_*.nii and spmT_*.nii files containing maps of estimated contrast values and T-statistics, respectively, for each specified first-level contrast.  These files will be stored in a directory named [root.subjects]/[subject_ID]/firstlevel_[model_ID]  for primary preprocessing pipelines or [root.subjects]/[subject_ID]/[pipeline_ID]/results/firstlevel/[model_ID] for secondary preprocessing pipelines

example organization of SUBJECTS and MODELS folders defining information about a model named taskAB

EL configuration files:

All information regarding data/preprocessing/analysis options in EL is specified through configuration files. These include: subject files indicating the location and file names of each subject's original/raw data;  preprocessing pipeline files indicating the steps and configuration options involved in each preprocessing pipeline;  and several model files indicating the details of each subject experimental design as well as configuration options for SPM first-level analyses

Subject files (data.cfg)

Within each subject directory a file named data.cfg defines the location and source of the original/raw data files for this subject (e.g. functional and anatomical files). It is recommended that the raw data files are also located within the same subject directory but this is not necessary (e.g. in scenarios where the raw files may be read-only and shared among multiple groups/researchers). A typical data.cfg file may contain the following information, for example, if your original/raw data is in NIFTI format:

SUBJECTS/subject0001/data.cfg

#functionals

/Volumes/ext/SUBJECTS/subject0001/raw/func_01.nii

/Volumes/ext/SUBJECTS/subject0001/raw/func_02.nii

#structurals

/Volumes/ext/SUBJECTS/subject0001/raw/anat_01.nii

#RT

2

example data.cfg file (see data.cfg documentation for additional details of this file format) 

or something like the following information, for example, if your original/raw data is in DICOM format:

SUBJECTS/subject0002/data.cfg

#dicoms

/Volumes/ext/SUBJECTS/subject0002/dicoms/*-3-1.dcm

/Volumes/ext/SUBJECTS/subject0002/dicoms/*-7-1.dcm

/Volumes/ext/SUBJECTS/subject0002/dicoms/*-13-1.dcm

#functionals

7 13

#RT

2.53

#structurals

3

example data.cfg file (see data.cfg documentation for additional details of this file format) 

Subject definition files may also indicate the source of fieldmap files, as well as ROI files. 

The full documentation of the information and format of data.cfg files can be found in here.  Unless otherwise specified by a different extension (e.g. .json), data.cfg files are assumed to use the [.cfg] file format.

Preprocessing pipeline files (pipeline_preproc.cfg)

Within the root.pipelines directory a number of files named pipeline_preproc_[pipeline-ID].cfg define different preprocessing pipelines. Each pipeline file defines a list of preprocessing steps to be executed sequentially. In addition, pipeline files may also specify the individual parameters/options of each step, assign labels to the functional data output by each preprocessing step to be referred to later, or jump back to continue preprocessing the functional data output by a previous step. An example of a default preprocessing pipeline in EL is the following:

conn/modules/el/pipeline_preproc_DefaultMNI_PlusStructural.cfg

#steps

structural_center

structural_segment&normalize

functional_label_as_original

functional_realign

functional_center

functional_art

functional_label_as_subjectspace

functional_segment&normalize_direct

functional_label_as_mnispace

functional_regression

functional_smooth

functional_label_as_minimallysmoothed

functional_smooth

functional_label_as_smoothed


#fwhm

4

6.9282


#reg_names

realignment

scrubbing

White Matter

CSF


#reg_dimensions

inf

inf

5

5


#reg_deriv

1

0

0

0


#reg_skip

1

one of the included preprocessing/denoising pipelines in EL (see preprocessing .cfg documentation for additional details about the format of these preprocessing .cfg files)

This pipeline uses direct-normalization to segment and normalize separately the functional and anatomical data. In addition it also realigns the data and applies spatial smoothing, it runs outlier identification in order to estimate potential outlier scans, and it estimates White matter and CSF components to be possibly used later for denoising.

 The full documentation of the information and format of preprocessing pipeline files can be found in here.  Unless otherwise specified by a different extension (e.g. .json), preprocessing pipeline files are assumed to use the [.cfg] file format

Model pipeline files (pipeline_model.cfg)

Within the root.tasks directory a file named [model_ID].cfg (or when manually specifying model-options during runtime, within the root.pipelines directory a file named pipeline_model_[options_ID].cfg) describes the details of SPM 1st-level parameter-estimation procedure. An example of the default model pipeline file in EL is the following:

conn/modules/el/pipeline_model_Default.cfg

#functional_label 

minimallysmoothed


#model_basis

hrf+deriv


#model_covariates

motion

art


#model_serial

AR(1)


#hpf

128


one of the included sets of predefined model estimation options in EL (see model-estimation.cfg documentation for additional details of this model configuration file format)

These files may specify the source of functional data (e.g. minimallysmoothed label in the example above), temporal basis modeling the expected hemodynamic response function shape (e.g. hrf+deriv option in the example above), high- or low-pass filters or noise covariates such as scrubbing or aCompCor factors to be included during model estimation, etc. 

The full documentation of the information and format of these files can be found in here.  

Model-design files (.para)

Within the root.tasks directory a series of files named [model-ID]_*.para (e.g. tasksAB_random1.para) contain the onset/duration/event-type information defining all individual events or blocks in one particular instantiation of an experimental design. A typical .para file may look like the following:

tasksAB_v1.para

% time units (scans/secs)

#units scans


% onset_time (in scans units; 0-indexing) / task_type

#onsets

0.00 1

4.00 2

8.00 1

13.00 1

16.00 1

20.00 1

24.00 1

30.00 1

36.00 2

42.00 2

45.00 2

48.00 1

52.00 1

55.00 2

60.00 2

65.00 2

70.00 1

75.00 2

79.00 1

82.00 1

85.00 2

88.00 1

94.00 1

97.00 1

104.00 1

107.00 2

111.00 2

115.00 2

120.00 2

126.00 1

131.00 2

134.00 2

138.00 2

143.00 1

147.00 2

153.00 2

160.00 2

163.00 1

166.00 1

172.00 2


% task names

#names

Speech 

NonSpeech


% task durations (in scans units)

#durations

3 3


example .para file (see design.para documentation for additional details) 

Design information .para files may also indicate the timing of individual scans/acquisitions (for sparse acquisition designs), parametric modulators of task effects (e.g. reaction time), temporal modulation effects, etc. 

The full documentation of the information and format of these files can be found in here.  The same .para file may apply to one or multiple subjects and one or multiple sessions, for example for different subjects or sessions where different variations or random orders of the same experimental tasks were used. The information of which .para files are associated with a particular subject/session is stored in subject-specific model-association files (see below). Unless otherwise specified by a different extension, .para files are assumed to use the [.cfg] file format

Model-association files (.cat)

Within each subject directory a file named [model-ID].cat describes which experimental design should be used when analyzing a functional run. This association may possibly be different across different 1st-level analyses, for example when different 1st-level analyses require different ways to model the tasks or when different subset of functional runs are used in different analyses. A typical .cat association file may look like the following example, defining two functional runs and their associated model design files:

tasksAB.cat

% functional runs included in tasksAB experiment

#runs 

2


% experimental designs used in these runs

#files

tasksAB_v1.para

tasksAB_v2.para

example .cat file (see design.cat documentation for additional details) 

Unless otherwise specified by a different extension, .cat files are assumed to use the [.cfg] file format

Model-contrast files (.con)

Within the root.tasks directory a series of text files named [model-ID].con define all of the contrasts that we would like to evaluate for each experimental design. Each contrast is defined in a separate line as a linear combination of modeled tasks (as defined in the #names fields of the model-design files for the same model-ID). A typical contrast file may contain the following information, for example:

contrasts_tasksAB.con

Speech-Baseline     Speech 1

NonSpeech-Baseline  NonSpeech 1

Speech-NonSpeech    Speech 1 NonSpeech -1

NonSpeech-Speech    Speech -1 NonSpeech 1

AverageTasks        Speech 0.5 NonSpeech 0.5

example contrast definition file (see model.con documentation for details) 

The full documentation of the information and format of these files can be found in here.  Unless otherwise specified by a different extension, .con files are assumed to use any standard .txt/.csv/.tsv convention for separating the different fields in each line/contrast (note: for back-compatibility, model-contrast information for multiple models can also be stored in a single contrasts_by_expt.txt file within the root.tasks folder).

EL example: 

preprocessing functional and anatomical images

A typical sequence of commands to preprocess your functional/anatomical data would look like the following:

>> conn module el init;

>> el root.subjects /data/subjects;

>> el root.tasks /data/designs;

>> el preprocessing sub0001 DefaultMNI;

>> el preprocessing.qa sub0001 DefaultMNI;

repeating steps 2) and 3) for any additional subjects. 

If necessary, additional preprocessing steps can be run a posteriori on an already preprocessed dataset. For example, after step 2) above,  using the following syntax would run and additional spatial smoothing step to a dataset which has already been preprocessed using the DefaultMNI pipeline:

>> el preprocessing.append sub0001 DefaultMNI OnlySmooth;

EL preprocessing pipelines are defined through .cfg files, and may use any of the preprocessing and denoising options available in CONN. EL includes several standard preprocessing pipelines already defined and tailored for task activation analyses (see conn/modules/el/pipeline_preproc_[pipeline_ID].cfg files for details), and users may modify these files and/or create their own pipelines. 

When using EL to preprocess your data, EL will create additional subdirectories within each subject directory containing the results of all preprocessing steps (these directories are named [pipeline_ID] after the name of preprocessing pipeline run). 

(see "help el" and "help evlab17_run_preproc" for additional details) 

example contents of SUBJECTS folder after preprocessing

model estimation (task activation analyses)

A typical sequence of commands to run first-level GLM task activation analyses of your functional data would look like the following:

>> el model sub0001 DefaultMNI tasksAB; 

>> el model.qa sub0001 DefaultMNI tasksAB;

repeating steps 4) and 5) for any additional subjects.

The results of the first-level analyses will be stored in a [subject_ID]/[preproc_ID]/results/firstlevel/[model_ID] directory named after the experimental design and contained within the results/firstlevel subdirectory of the preprocessed dataset. The contents of this directory are the standard SPM first-level analysis outputs, including an SPM.mat file containing the details of the estimated General Linear Model and which can be loaded in SPM, beta_*.nii files containing maps of estimated effect-sizes for each model regressor, as well as con_*.nii and spmT_*.nii files containing maps of estimated contrast values and T-statistics, respectively, for each specified first-level contrast.  These results can be displayed using the syntax:

>> el model.stats sub0001 DefaultMNI tasksAB;

example contents of SUBJECTS folder after first-level analyses

Additional first-level analysis details or options can be specified using a last argument to the "el model ..." command, pointing to a model configuration options file (either directly or indirectly to files located in conn/modules/el/pipeline_model_*.cfg), e.g.

>> el model sub0001 DefaultMNI tasksAB Default; 

(see "help el" and "help evlab17_run_model" for additional details)