Cluster/HPC configuration

In computer clusters or High-Performance Computing environments CONN can automatically distribute your processing/analyses in parallel across multiple workers/CPUs. This can result in a very significant reduction in processing time, and, given enough available workers/CPUs, it allows the processing and analysis of hundreds or thousands of subjects in the time it would normally take to process just a few subjects. CONN automatically handles all of the complexities associated with dividing your processing step into multiple jobs, submitting and tracking each job, and merging their results, independently of the underlying architecture or job-scheduler technology used (e.g. whether working within a distributed cluster environment, or on a single multiple-processor machine)

Example of use

To use your cluster computing resources from CONN's graphical interface simply select the 'distributed processing' option when starting your analysis/processing steps, and then select how many jobs/workers you would like this work distributed among.

For example, when clicking on the button labeled 'Done' in CONN's Setup tab, you may choose the "distributed processing (run on Background)" option to run this step as a background process in your local machine, then click on the 'Start' button and select 4 as the desired number of jobs. CONN will divide the required processing steps into four batches, each focusing on approximately 1/4th of the subjects in your study, and run those four batches as background processes in your local machine each using a single CPU. When those processes are finished CONN will then collect and merge their results back into your CONN project.

If, instead of using CONN's GUI, you prefer to access the exact same functionality using batch scripts, simply include a parallel field in your batch structure and describe there the number of jobs and desired parallelization profile. For example, the Matlab command:

conn_batch('Setup.done',true, 'Setup.overwrite',true, 'parallel.N',4, 'parallel.profile','Background process (Unix,Mac)');

will perform the exact same steps as in the GUI example above, running CONN's Setup processing pipeline in the background using four separate processes/batches.

Last, if instead of a single multi-core computer, you are running CONN from a computer which is part of a larger computer cluster, you can use the exact same options as in the examples above but choosing instead the appropriate cluster scheduler (e.g. choose "distributed processing (run on Grid Engine computer cluster)" option in the GUI if your computer cluster uses a GridEngine job scheduler). CONN will then automatically submit those same batches to your cluster scheduler as four different jobs, tracking their status in the queue, and merging their results accordingly when they finish.

All of CONN's processing steps, including functional and structural data preprocessing pipelines, denoising, and all first-level functional connectivity analyses available in CONN, can be parallelized using exactly the same GUI or batch options as in the examples above. To configure CONN parallelization options in a distributed cluster, HPC, or any multi-processor environment (including regular Mac, Windows, or Linux machines) follow the steps described in the sections below.

Basic configuration settings in distributed cluster or multi-processor environments

Pre-defined cluster configuration options are available for the following schedulers/environments:

Grid Engine : for Sun/Oracle Grid Engine, StarCluster, Open Grid Scheduler, or compatible

e.g. Amazon EC2, NITRC-CE, Boston University SCC, UCLA Hoffman

Slurm : for Simple Linux Utility for Resource Management, or compatible

e.g. MIT Openmind, NIH Biowulf, Berkeley Savio, Princeton Tiger, Stanford Sherlock

PBS/Torque : for Portable Batch System, or compatible

e.g. MIT Mindhive, MGH Launchpad, Yale Omega

LSF : for Platform Load Sharing Facility, OpenLava, or compatible

e.g. HMS Orchestra, Yale Grace

HTCondor : for HTCondor high-throughput computing framework, or compatible

e.g. UW-Madison CHTC, OvGU-Magdeburg Medusa

Background : for running multiple single-processor background jobs in your local machine

e.g. any Mac, Windows, or Linux multi-processor system

If your cluster/HPC environment is one of the above, chances are CONN parallelization options will work right out-of-the-box without requiring any additional configuration beyond simply selecting which one among the above schedulers/environment should CONN use. To select your preferred choice of scheduler/environment, after installing CONN (either Matlab- or standalone- releases), you may use any one of the following two methods:

Method 1: using the GUI (recommended)

In CONN's gui, select the Tools.Cluster/HPC Settings menu

Then select one of the default configuration profiles (GridEngine, Slurm, PBS, LSF, HTCondor, or Background) and click 'Test profile'. During a test CONN will attempt to submit simple jobs using the specified configuration options. CONN will then track the jobs evolution, and will evaluate whether they finish correctly. This test may take up to a few minutes to complete. If you see a 'Test finished correctly' message, the configuration options are working correctly.

note: if you see a 'failed' message, select 'Advanced options' and 'See log' to look at the different logs recorded by CONN during this test and evaluate the potential reason for the failure. Then select 'Cancel job', make the appropriate changes to the profile settings, and select 'Test profile' to try again.

After successfully testing the cluster configuration settings, simply select this as your default profile and click 'Save' to save these settings for future sessions and/or for other users.

Method 2: using Matlab or Linux commands (for text-only environments)

Type the following command to configure your system to use the pre-defined configuration options for Grid Engine clusters, and have those settings apply to all CONN users:

conn jobmanager setdefault 'Slurm' save all

note: if installing CONN Matlab release type the above syntax in the Matlab command-window; if installing CONN standalone release type the same syntax in your linux terminal command-line

The "save all" option in the command above will store configuration settings in your CONN installation folder (this requires write permissions into this folder). This configuration settings will then be available to all users that use CONN in your cluster. Using "save current" instead will store configuration settings in the current user home folder ~/ and these configuration settings will then be available only to this user. See help conn_jobmanager for additional options.

You may also use the following command in order to have CONN test your cluster configuration settings

conn_jobmanager test

or the following command to display all your current scheduler configuration options

conn_jobmanager options

or to modify those options, e.g.

conn_jobmanager options cmd_submitoptions '-t 24:00:00 --mem=16Gb'

see "help conn_jobmanager" and remote commands for additional details

Advanced configuration settings in distributed cluster / HPC environments

In some cases, and depending on the specifics of your cluster/HPC environment, some further configuration might be needed. The following sections describe advanced configuration options. These are meant primarily towards administrators looking to install CONN in a cluster environment serving multiple users. These options cover flexible customisations of the scripts used by CONN to submit jobs, allowing administrators to adapt them to a wide variety of cluster environments and user needs.

Using the field 'in-line additional submit options' to define optional job specifications

In-line additional submit options can be used by system administrators to facilitate user-based edits to a general-purpose parallelization profile. For example the default command to submit a job on a 'Grid Engine' environment is:

qsub -N JOBLABEL -e STDERR -o STDOUT OPTS SCRIPT

When CONN needs to submit a job, it will automatically replace the keywords JOBLABEL, STDERR, STDOUT, OPTS, and SCRIPT here with their appropriate values (i.e. JOBLABEL is a unique label for each job; STDERR/STDOUT are the standard output and standard error log files, SCRIPT is the name of each submitted script file, and OPTS is any user-defined string), and execute the resulting string as a OS-level command. The string that will replace the keyword OPTS can be defined by any CONN user as part of the cluster/HCP configuration step either from the GUI additional submit settings (optional) in-line field, or programmatically using the cmd_submitoptions field (see help conn_jobamanager; see batch.parallel.cmd_submitoptions when using batch commands). While this field is generally left empty, individual users may alter it to add their desired configuration options to the qsub command. For example, a user may enter in the cmd_submitoptions field the line

-l h_rt=24:00:00 -m ae

in order to request 24 hours walltime and email notification for his/her jobs.

In addition, administrators may also edit the cmd_submitoptions field and add a '?' symbol at the very end of the string to indicate that user-interaction is required. When using this option, administrators can add any number of tokens of the form [<question text>:<default response text>] within the cmd_submitoptions field. CONN in turn will request user-input when composing the appropriate qsub command and before submitting jobs to the queue. For example, entering in the cmd_submitoptions field the line:

-q [queue name:myqueue] -A [account name:myaccount]?

will have CONN query users for their queue and account names before submitting jobs, and then automatically include the user responses as part of a final qsub OS-command of the form:

qsub ... -q myqueue -A myaccount ...

Using the field 'additional submit settings (optional) in-file' for additional configuration and/or initialization steps

Typically, an example script generated by CONN (Matlab release), submitted to your job scheduler and then run by an individual node/CPU, may look like this:

#!/bin/bash

/usr/local/apps/matlab-2013a/bin/matlab -nodesktop -nodisplay -nosplash -singleCompThread -logfile '/projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/161013132422408/node.0001161013132422408.stdlog' -r "addpath /project/busplab/software/spm12; addpath /project/busplab/software/conn; cd /projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/161013132422408; conn_jobmanager('rexec','/projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/161013132422408/node.0001161013132422408.mat'); exit"

echo _NODE END_

while an example script generated by CONN (standalone release) may look like this:

#!/bin/bash

/share/pkg/conn_standalone/R2017a/run_conn.sh /share/pkg/mcr/9.2/install/v92 jobmanager rexec '/projectnb/busplab/connectomedb/conn_vol/conn_hcp.qlog/170605215826702/node.0001170605215826702.mat'

echo _NODE END_

The scripts simply call either Matlab or CONN stand-alone executables, prompting them to run a node-specific process (with the details of this process contained in the referred .mat files)

When configuring HCP/Cluster settings, any commands defined in the in-file additional submit options field will be automatically added to these scripts as additional lines right after the #!/bin/bash line (and before the line calling Matlab- or standalone- conn). This can be used for a variety of purposes. Some job schedulers are able to automatically interpret configuration settings from these beginning lines: for example, adding the lines

#$ -m ae

#$ -M myname@gmail.com

may be used in a Grid Engine environment to request the job scheduler to send an email when jobs are aborted or end normally.

In addition, you may also simply add your own initialization routines in these lines, such as:

module load mcr/9.2

module load conn_standalone/R2017a

to have the individual cluster nodes add the MCR and CONN modules before executing the requested CONN jobs.

Using user-specific cluster-configuration settings for added flexibility

User-specific configuration settings (those saved to a user home folder, e.g. using the "save current" option) take precedence over global configuration settings (those saved to the conn installation folder, e.g. using the "save all" option). Administrators may define global cluster-configuration settings that users may then fine-tune to their specific requirements (e.g. a user may change a general "-A [account]?" value to "-A brainlab" and save the new settings using the "save current" option in order to stop CONN from asking an account name when submitting jobs)

Notes on mixed Matlab+standalone environments

If both Matlab- and standalone- CONN releases are installed in your system, by default CONN will have individual cluster workers/computers use the same release as the job-submitting node. For example, preprocesing your data using parallelization options from CONN's standalone release will have all involved distributed cluster workers using the same standalone release, while running the same procedure from a Matlab release will have all workers using the same Matlab release. If you prefer submitted jobs to always use the standalone release, independently of the release used to start the parallelization procedure (e.g to avoid depletion of available Matlab licenses, while still allowing users to use CONN's Matlab release interactively), simply check the 'nodes use pre-compiled CONN only' field when defining your cluster configuration settings.