# Cluster-level inferences

## Cluster-level inferences in voxel-based analyses

A standard second-level General Linear Model analysis of fMRI functional activation or fcMRI connectivity maps produces a single statistical parametric map, with one T- or F- value for each voxel in this map characterizing the effect of interest (e.g. difference in connectivity between two groups) at each location. When reporting or interpreting these results, rather than focusing on individual voxels, it is often convenient to focus on areas sharing similar effects or results. In order to support our ability to make inferences about these areas, a number of methods have been developed that precisely specify how these areas/clusters are defined from the data, and how to assign statistics to each area/cluster in a way that allows us to make inferences about them while controlling the analysis-wise chance of false positives

CONN implements three popular methods that offer family-wise error control at the level of individual clusters: 1) parametric statistics based on **Random Field Theory** (Worsley et al. 1996); 2) nonparametric statistics based on **permutation/randomization analyses** (Bullmore et al. 1999); and 3) nonparametric statistics based on **Threshold Free Cluster Enhancement** (Smith and Nichols, 2007)

### Random Field Theory (RFT) parametric statistics

Cluster-level inferences based on Gaussian Random Field theory (Worsley et al. 1996) start with a statistical parametric map of T- or F- values estimated using a General Linear Model. This map is first thresholded using an a priori "height" threshold level (e.g. T>3 or p<0.001). The resulting suprathreshold areas define a series of non-overlapping clusters (neighboring voxels using an 18-connectivity criterion when analyzing 3D volumes). Each cluster is then characterized by its extent/size (number of voxels), and these sizes are compared to a known distribution of expected cluster sizes under the null hypothesis, as estimated from a combination of the analysis degrees of freedom, the approximated level of spatial autocorrelation of the general linear model residuals, and the selected height threshold level. The results are summarized, for each individual cluster, by a * cluster-level uncorrected p-value*, defined as the likelihood of a randomly-selected cluster having this size or larger under the null hypothesis, as well as a

*, defined as the likelihood under the null hypothesis of observing at least one or more clusters of this or larger size over the entire analysis volume, and a*

**cluster-level FWE-corrected p-value***(topological False Discovery Rate, Chumbley et al. 2010), defined as the expected proportion of false discoveries among all clusters of this or larger size over the entire analysis volume, again under the null hypothesis.*

**cluster-level FDR-corrected p-value**A standard criterion ("* standard settings for cluster-based inferences #1: Random Field Theory parametric statistics*" in CONN's results explorer gui) for thresholding voxel-based functional activation or connectivity spatial parametric maps while appropriately controlling the family-wise error rate, uses RFT with a combination of an uncorrected p<0.001 height threshold to initially define clusters of interest from the original statistical parametric maps, and a FDR-corrected p<0.05 cluster-level threshold to select among the resulting clusters those deemed significant (those larger than what we could reasonably expect under the null hypothesis)

### Randomization/permutation nonparametric statistics

Cluster-level inferences based on randomization/permutation analyses (Bullmore et al. 1999) in CONN use the same cluster-forming procedure as above, with two main differences. First, instead of relying on Random Field Theory assumptions to approximate the probability density function of each cluster size under the null hypothesis, these and related distributions are numerically estimated using multiple (1,000 or higher) randomization/permutation iterations of the original data designed to explicitly simulate the null hypothesis. For each of these iterations, the statistical parametric map of T- or F- values is computed and thresholded in the same way as in the original data, and the properties of the resulting clusters are combined to numerically estimate the desired probability density functions under the null hypothesis for our choice of cluster metrics. And second, instead of relying simply on a cluster *size *to evaluate each cluster significance level, nonparametric analyses rely on each cluster *mass *(the sum of the T-squared or F- statistics across all voxels within each cluster), defined as:

where *m *is the mass of the i-th cluster, h(x) is the original statistical parametric map, and H=2 for T-statistic input maps, or H=1 for F-statistic input maps. Compared to cluster size, this measure can be expected to afford higher sensitivity not only to effects distributed over large areas but also to strong effects that may be concentrated over relatively small areas

As before, the results are summarized, for each individual cluster, by uncorrected, FWE-corrected, and FDR-corrected cluster-level p-values. * Uncorrected cluster-level p-values* are computed by comparing the mass of a given cluster with the observed distribution of cluster mass values across all clusters observed in the permutation/randomization iterations,

*are computed*

**FDR-corrected cluster-level p-values***using the standard Benjamini and Hochberg’s FDR algorithm using the estimated uncorrected p-values, and*

*are computed by comparing the mass of a given cluster with the distribution of the maximum/largest cluster mass across the entire analysis volume observed in each permutation/randomization iteration*

**FWE-corrected cluster-level p-values**A second standard criterion ("* standard settings for cluster-based inferences #2: permutation/randomization analyses*" in CONN's results explorer gui) for thresholding voxel-based spatial parametric maps that also appropriately controls family-wise error rates, uses randomization/permutation analyses with a combination of a uncorrected p<0.01 height threshold in order to initially define clusters of interest, and a FDR-corrected p<0.05 cluster-level threshold to select among the resulting clusters those deemed significant (clusters with larger mass than what we could reasonably expect under the null hypothesis)

Note that one of the main advantages of this approach over alternatives such as RFT, is that it remains valid over any user-defined choice of height threshold values (e.g. p<0.01 as used here), while Random Field Theory assumptions expect the height threshold to be relatively conservative (e.g. p<0.001 or smaller, see Eklund 2016). That makes permutation/randomization analyses well suited to deal with small samples and/or low-powered studies where expected effects may be too weak to reasonably surpass conservative voxel-level height thresholds

### Threshold Free Cluster Enhancement (TFCE) statistics

Cluster-level inferences based on Threshold Free Cluster Enhancement analyses (Smith and Nichols 2007) aim at removing the dependency of other cluster-level inference methodologies on the choice of an a priori cluster-forming height threshold. TFCE analyses in CONN start with a statistical parametric map of T- or F- values estimated using a General Linear Model. Instead of thresholding this map, a derived TFCE score map is instead computed as:

where h(x) is the original statistical parametric map, and e(h) is the extent of a cluster thresholded at height level h and containing the point x. In CONN implementation, TFCE model parameters are set by default to hmin=1, E=0.5, and H=2 for T-statistic input maps, or H=1 for F-statistic input maps, and the TFCE scores are computed using an exact integration method (dh→0 limit, unlike other implementations that use dh=0.1 discrete approximations)

The resulting TFCE scores at each voxel combine the strength of the statistical effect at this location with the extent of all clusters that would appear at this location when thresholding the original statistical parametric maps at any arbitrary height threshold. Then, as before, the expected distribution of TFCE values under the null hypothesis is numerically estimated using multiple (1,000 or higher) randomization/permutation iterations of the original data. Comparing at each voxel the observed TFCE value with the null-hypothesis distribution of maximum TFCE values across the entire analysis volume is used to compute * voxel-level FWE-corrected p-values* (defined as the likelihood under the null hypothesis of observing at least one or more voxels with this or larger TFCE scores over the entire analysis volume). Similarly, and following the approach in Chumbley et al. 2010, comparing each local-extremum/peak in the TFCE map with the null hypothesis distribution of local-peak TFCE values can be used to compute

*, defined as the likelihood under the null hypothesis of one randomly-selected peak in the TFCE map having this or larger scores, and associated*

**peak-level uncorrected p-values***, defined as the expected proportion of false discoveries across the entire analysis volume among peaks having this or larger TFCE scores*

**peak-level FDR-corrected p-values**Yet another standard criterion ("* standard settings for cluster-based inferences #3: Threhsold Free Cluster Enhancement*" in CONN's results explorer gui) for thresholding voxel-based spatial parametric maps while appropriately controlling the family-wise error rate, uses TFCE analyses with a FWE-corrected p<0.05 voxel-level threshold to select among the resulting maps those areas deemed significant (including only voxels having TFCE scores larger than what we could reasonably expect under the null hypothesis).

## Cluster-level inferences in ROI-to-ROI analyses

A standard second-level General Linear Model analysis of fcMRI connectivity matrices produces a single statistical matrix of T- or F- values, characterizing the effect of interest (e.g. difference in connectivity between two groups) among all possible pairs of ROIs. Similarly to the voxel-based analyses case, when the number of ROIs is large (e.g. from an atlas defining hundreds of regions across the entire brain, which translates to tens of thousands of connections), rather than focusing on individual connections between all possible pairs of ROIs, it is often convenient to focus on groups of nearby or related connections sharing similar effects or results. In order to support our ability to make inferences about these groups of connections, multiple methods have been developed that precisely specify how these groups/clusters can be defined from the data, and how to assign statistics to each of these groups/clusters in a way that allows us to make inferences about them while controlling the analysis-wise chance of false positives

CONN implements three popular methods offering family-wise error control at the level of individual clusters: 1) parametric statistics based on **Functional Network Connectivity**; 2) nonparametric statistics based on **permutation/randomization analyses**; and 3) nonparametric statistics based on **Threshold Free Cluster Enhancement**

### Functional Network Connectivity (FNC) multivariate parametric statistics

Cluster-level inferences based on multivariate statistics start by considering groups/networks of related ROIs. These networks can be manually defined by researchers (e.g. from an atlas or prior ICA decomposition), or they can be defined using a data-driven hierarchical clustering procedure (complete-linkage clustering, Sorensen 1948) based on ROI-to-ROI anatomical proximity and functional similarity metrics. Once networks of ROIs are defined, FNC analyzes the entire set of connections between all pairs of ROIs in terms of the *within*- and *between*- network connectivity sets (Functional Network Connectivity, Jafri et al. 2008), performing a multivariate parametric General Linear Model analysis for all connections included in each of these sets/clusters of connections. This results in a F- statistic for each pair of networks and an associated * uncorrected cluster-level p-value*, defined as the likelihood under the null hypothesis of a randomly selected pair of networks showing equal or larger effects than those observed between this pair of networks, and a

*(Benjamini and Hochberg, 1995), defined as the expected proportion of false discoveries among all pairs of network with similar or larger effects across the entire set of FNC pairs*

**FDR-corrected cluster-level p-value**A standard criterion ("* standard settings for cluster-based inferences #1: parametric multivariate statistics*" in CONN's ROI-to-ROI results explorer gui) for thresholding ROI-to-ROI parametric maps while appropriately controlling the family-wise error rate, uses FNC with a FDR-corrected p<0.05 cluster-level threshold to select among all network-to-network connectivity sets those deemed significant (showing larger multivariate effects than what we could reasonably expect under the null hypothesis), together with a post-hoc uncorrected p<0.05 height (connection-level) threshold to help characterize the pattern of individual connections that show some of the largest effects within each significant set

### Randomization/permutation Spatial Pairwise Clustering (SPC) statistics

Cluster-level inferences based on randomization/permutation ROI-to-ROI analyses in CONN use the general approach known as Spatial Pairwise Clustering (Zalesky et al. 2012). It starts with the entire ROI-to-ROI matrix of T- or F- statistics estimated using a General Linear Model, forming a two-dimensional statistical parametric map. ROIs in this matrix are sorted either manually by the user (e.g. from an atlas), or automatically using a hierarchical clustering procedure (optimal leaf ordering for hierarchical clustering, Bar-Joseph et al. 2001) based on ROI-to-ROI anatomical proximity or functional similarity metrics. Then this statistical parametric map is thresholded using an a priori "height" threshold (e.g. T>3 or p<0.001). The resulting suprathreshold areas define a series of non-overlapping clusters (groups of neighboring connections using an 8-connectivity criterion on upper triangular part of symmetrized suprathreshold matrix). Each cluster is then characterized by its *mass* (sum of F- or T-squared statistics over all connections within each cluster), and these values are compared to a distribution of expected cluster mass values under the null hypothesis, which is numerically estimated using multiple (1,000 or higher) randomization/permutation iterations of the original data. For each of these iterations, the new statistical parametric map of T- or F- values is computed and thresholded in the same way as in the original data, and the properties of the resulting suprathreshold clusters are combined to numerically estimate the probability density under the null hypothesis for our choice of cluster metric. The results are summarized, for each individual cluster or group of connections, by * uncorrected cluster-level p-values*, representing the likelihood of a randomly-selected cluster of connections having this or larger mass under the null hypothesis,

*, defined as the likelihood under the null hypothesis of finding one or more clusters with this or larger mass across the entire set of ROI-to-ROI connections, and*

**cluster-level FWE-corrected p-values***, defined as the expected proportion of false discoveries among clusters having this or larger mass across the entire set of ROI-to-ROI connections*

**cluster-level FDR-corrected p-values**A second standard criterion ("* standard settings for cluster-based inferences #2: Spatial Pairwise Clustering statistics*" in CONN's ROI-to-ROI results explorer gui) for thresholding ROI-to-ROI parametric maps that also appropriately controls family-wise error rates, uses randomization/permutation analyses with a combination of a uncorrected p<0.01 height threshold in order to initially define clusters of interest, and a FDR-corrected p<0.05 cluster-level threshold to select among the resulting clusters those deemed significant (clusters with larger mass than what we could reasonably expect under the null hypothesis)

### Threshold Free Cluster Enhancement (TFCE) statistics

Cluster-level inferences based on Threshold Free Cluster Enhancement analyses (Smith and Nichols 2007) in CONN can be also used in the context of ROI-to-ROI connectivity matrices. Similarly to SPC analyses, TFCE starts with the entire ROI-to-ROI matrix of T- or F- statistics estimated using a General Linear Model, with ROIs again sorted either manually by the user (e.g. from an atlas), or automatically using a hierarchical clustering procedure (optimal leaf ordering for hierarchical clustering, Bar-Joseph et al. 2001) based on ROI-to-ROI anatomical proximity or functional similarity metrics. Instead of thresholding this map using a priori height threshold, TFCE analyses proceed by computing the associated TFCE score map, combining the strength of the statistical effect for each connection with the extent of all clusters or groups of neighboring connections that would appear at this location when thresholding the original statistical parametric maps at any arbitrary height threshold. Then, as before, the expected distribution of TFCE values under the null hypothesis is numerically estimated using multiple (1,000 or higher) randomization/permutation iterations of the original data, and used to compute for each cluster in the original analysis a * peak-level FWE-corrected p-value* (defined as the likelihood under the null hypothesis of observing at least one or more connections with this or larger TFCE scores over the entire ROI-to-ROI connectivity matrix). Similarly, and following the approach in Chumbley et al. 2010, each local-extremum/peak in the TFCE map is compared to the null hypothesis distribution of local-peak TFCE values to estimate a

*, representing the likelihood under the null hypothesis of one randomly-selected peak in the TFCE map having this or larger scores, and associated*

**peak-level uncorrected p-value***, defined as the expected proportion of false discoveries among peaks having this or larger TFCE scores across the entire ROI-to-ROI matrix*

**peak-level FDR-corrected p-values**Yet another standard criterion ("* standard settings for cluster-based inferences #3: Threshold Free Cluster Enhancement statistics*" in CONN's ROI-to-ROI results explorer gui) for thresholding ROI-to-ROI parametric maps while appropriately controlling the family-wise error rate, uses TFCE analyses with a FWE-corrected p<0.05 connection-level threshold to select among the resulting maps those groups of connections deemed significant (including only connections having TFCE scores larger than what we could reasonably expect under the null hypothesis).

### Alternatives to cluster-level inferences: connection-, ROI-, and network- level inferences

In addition to the three main approaches listed above, CONN also implements alternative methods focusing on other units of inferential analyses, ranging from individual connections to entire networks of connections

The first approach ("* alternative settings for connection-based inferences, parametric univariate statistics*" in CONN's ROI-to-ROI results explorer gui) allows users to make inferences about individual connections, rather than focusing on groups of connections. In order to control family-wise error rates, this approach simply uses the standard Benjamini and Hochberg’s FDR algorithm to compute for each individual connection (between all pairs of ROIs) a

*, defined as the expected proportion of false discoveries among all connections with effects larger than this one across the entire ROI-to-ROI matrix. The default criterion uses a connection-level FDR-corrected p<0.05 threshold to select among all connections those deemed significant (with larger effects than what we could reasonable expect under the null hypothesis)*

**connection-level FDR-corrected p-value**The second approach ("* alternative settings for ROI-based inferences, parametric multivariate statistics*" in CONN's ROI-to-ROI results explorer gui) allows users to make inferences about individual ROIs. This approach uses the same general strategy as FNC but instead of defining sets/clusters of connections based on a data-driven clustering approach, it explicitly defines a different set/cluster of connections for each row of the ROI-to-ROI matrix, grouping all connections that arise from the same ROI as a new set/cluster. It then performs a multivariate parametric General Linear Model analysis for all connections included in each of these new sets/clusters of connections. This results in a F- statistic for each individual ROI and an associated

*, defined as the likelihood under the null hypothesis of a randomly selected ROI showing equal or larger effects than those observed at this ROI, and a*

**uncorrected ROI-level p-value***(Benjamini and Hochberg, 1995), defined as the expected proportion of false discoveries among all ROIs with similar or larger effects across the entire set of ROIs included in the original analysis. The default criterion in CONN for this approach uses a ROI-level FDR-corrected p<0.05 threshold in order to select among all ROIs those deemed significant (with larger effects than what we could reasonably expect under the null hypothesis), together with a post-hoc uncorrected p<0.01 height (connection-level) threshold to help characterize the pattern of individual connections that show some of the largest effects from each significant ROI*

**FDR-corrected ROI-level p-value**The third approach ("* alternative settings for network-based inferences, Network Based Statistics*" in CONN's ROI-to-ROI results explorer gui) allows users to make inferences about entire networks of ROIs. CONN uses here the approach known as Network Based Statistics (Zalesky et al. 2010). Similar to SPC, it starts with the entire ROI-to-ROI matrix of T- or F- statistics estimated using a General Linear Model, forming a two-dimensional statistical parametric map. Unlike SPC or TFCE, the order of ROIs in this matrix is not relevant to these analyses. This statistical parametric map is then thresholded using an a priori "height" threshold (e.g. T>3 or p<0.001). The resulting suprathreshold connections define a graph among all nodes/ROIs. This graph is then broken down into components/networks, defined as connected subgraphs, and from here the procedure continues in the same way as SPC, but using networks instead of clusters as a basis to group multiple connections into a set. Each network is then characterized by its network

*mass*(sum of F- or T-squared statistics over all connections within each cluster), and these values are compared to a distribution of expected network mass values under the null hypothesis, which is numerically estimated using multiple (1,000 or higher) randomization/permutation iterations of the original data. Results are summarized, for each individual network or group of connections, by

*, representing the likelihood of a randomly-selected network having this or larger mass under the null hypothesis,*

**uncorrected network-level p-values***, defined as the likelihood under the null hypothesis of finding one or more networks with this or larger mass across the entire set of ROI-to-ROI connections, and*

**network-level FWE-corrected p-values***, defined as the expected proportion of false discoveries among networks having this or larger mass across the entire set of ROI-to-ROI connections. The default criterion in CONN for this approach uses a combination of a uncorrected p<0.001 height threshold in order to initially define networks of interest, and a FDR-corrected p<0.05 network-level threshold to select among the resulting networks those deemed significant (networks with larger mass than what we could reasonably expect under the null hypothesis)*

**network-level FDR-corrected p-values**Last, in addition to the default and alternative methods described above it is also possible ("* advanced family-wise error control settings*" in CONN's voxel-based or ROI-to-ROI results explorer gui) to use different choices and combinations of thresholds for any of the general procedures listed above, allowing a wide variety of possible analysis strategies. All default criteria in CONN are defined to ensure proper

**analysis-wise**error control (i.e. appropriately control for all multiple comparisons within each second-level analysis) while affording reasonable sensitivity in most scenarios, but researchers are encouraged to explore different settings and use the method most appropriate to the specificities of their study. In order to avoid "p-hacking" (trying multiple combinations of analysis options but only reporting the one that works best) we strongly recommend researchers select the desired inferential approach and parameter choices a priori (e.g. based on pilot data, prior studies, or the literature most related to their sub-field), and/or attempt to appropriately control for all different analyses that have been run by applying an additional Bonferroni- or FDR- correction to the observed

*cluster*-level p-values (e.g. to convert from "analysis-wise" FWE control to "study-wise" FWE control)

## References

Bar-Joseph, Z., Gifford, D. K., & Jaakkola, T. S. (2001). Fast optimal leaf ordering for hierarchical clustering. *Bioinformatics*, *17*(suppl_1), S22-S29.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. *Journal of the Royal statistical society: series B (Methodological)*, *57*(1), 289-300.

Bullmore, E. T., Suckling, J., Overmeyer, S., Rabe-Hesketh, S., Taylor, E., & Brammer, M. J. (1999). Global, voxel, and cluster tests, by theory and permutation, for a difference between two groups of structural MR images of the brain. *IEEE transactions on medical imaging*, *18*(1), 32-42.

Chumbley, J., Worsley, K., Flandin, G., & Friston, K. (2010). Topological FDR for neuroimaging. *Neuroimage*, *49*(4), 3057-3064.

Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. *Proceedings of the national academy of sciences*, *113*(28), 7900-7905.

Jafri, M. J., Pearlson, G. D., Stevens, M., & Calhoun, V. D. (2008). A method for functional network connectivity among spatially independent resting-state components in schizophrenia. *Neuroimage*, *39*(4), 1666-1681.

Smith, S. M., & Nichols, T. E. (2009). Threshold-free cluster enhancement: addressing problems of smoothing, threshold dependence and localisation in cluster inference. *Neuroimage*, *44*(1), 83-98.

Sørensen, T. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter / Kongelige Danske Videnskabernes Selskab 5: 1-34.

Worsley, K. J., Marrett, S., Neelin, P., Vandal, A. C., Friston, K. J., & Evans, A. C. (1996). A unified statistical approach for determining significant signals in images of cerebral activation. *Human brain mapping*, *4*(1), 58-73.

Zalesky, A., Fornito, A., & Bullmore, E. T. (2010). Network-based statistic: identifying differences in brain networks. *Neuroimage*, *53*(4), 1197-1207.

Zalesky, A., Fornito, A., & Bullmore, E. T. (2012). On the use of correlation as a measure of network connectivity. *Neuroimage*, *60*(4), 2096-2106.

## How to use cluster-level inferences in CONN

CONN's cluster-level inferences can be run using any of the following options:

### Option 1: using CONN's gui

If you have analyzed your data in CONN, follow these instructions to define your second-level analysis. After that simply click on '**Results explorer**' to have CONN launch the corresponding (voxel-level or ROI-to-ROI) results explorer window. The menu at the top of this window allows you to choose between the different standard criteria appropriate for this analysis (e.g. choose * standard settings for cluster-based inferences #1: Random Field Theory parametric statistics *option for cluster-level inferences based on Random Field Theory (RFT) parametric statistics in voxel-based analyses; alternatively choose the option labeled '

*advanced Family-Wise Error control settings*' for advanced options). Any significant clusters, using the selected cluster-level inferential approach, will be displayed and their associated cluster-level properties and statistics listed in table form below.

### Option 2: using CONN's commands

If you have already defined and run the corresponding second-level GLM analysis (either from the gui or using CONN batch/modular commands; otherwise start with the instructions in the General Linear Model section to define your second-level analysis) you may use the following syntax:

`conn display /myresults/SPM.mat`

to launch the results explorer window for the voxel-based, surface-based, or ROI-to-ROI analyses saved in the directory /myresults (see *doc conn_display** *for additional options). As before, the menu at the top of the corresponding results explorer window will allow you to choose between the different standard criteria appropriate for each analysis.