Towards Tunable Consensus Clustering for Studying Functional Brain Connectivity During Aﬀective Processing

decades


Introduction
After the rapid rise of human neuroimaging in the 90s ("the decade of the brain") 1 with funding initiatives, papers in such prominent journals as Nature, Science, and lavish attention from the press, there is currently a vein of skepticism towards neuroimaging findings. Papers making the headlines announced significant brain activity in dead salmon or evidenced magical high correlations between behavioral and brain data. 2 Although the fascination of brain data can still blur critical thinking in front of a crudelybuilt mock brain scanner, 3 scientists are questioning the reliability of neuroimaging and the danger of false positives and reverse inference, hence compromising the relevance of a whole field for the general scientific community. 4 A recent main criticism relies on the wide variety of analysis strategies, combined with small sample sizes, used to investigate regional brain activity measured with functional magnetic resonance imaging (fMRI) and leading to inconsistent findings. 5 To overcome the limitations of model-based methods, data-driven methods imposed by the researchers, such as clustering, independent component analysis (ICA), seed-based functional connectivity analysis (for reviews, cf. Ref. 6), and inter-subject correlation analysis 7 are increasingly adopted. However, new algorithms are often proposed, augmenting the discrepancies in the results and the difficulty of choosing the most appropriate data-driven method. Calls for consensus in analysis and meta-analysis methods for neuroimaging have been made. 8 In research fields characterized by the sheer complexity of the stimulus parameters and the subjectivity of the individual mental state, such as the investigation of musical emotions, the aforementioned difficulties are even more reflected in the wide discrepancy of results. In a recent meta-analysis of 21 fMRI studies on musical emotions, the amygdala, 9 the anterior cingulated cortex, the insula, 10 the orbitofrontal cortex 11,12 and the reward circuit 13 were found to be associated with any musical emotion. While this meta-analysis shows a consistent view, the picture becomes more fragmented when looking at different types of emotions. Very recently, data-driven methods, such as graph theory, 14 eigenvector centrality mapping, 15 network science [16][17][18] and ICA 19 have been only marginally adopted to investigate functional connectivity during listening to music. Clustering analysis 20,21 has instead not been applied to music neuroscience. In the broader domain of cognitive neuroscience, many methods have been used to address the clustering problem such as K-means, 22 hierarchical clustering, 23 artificial neural network 24 -based self-organizing maps (SOMs), 25 graph clustering, 22,26 and fuzzy clustering. 27 When using clustering analysis, there is one critical procedure that determines the appropriate algorithm and related parameters such as the number of clusters K. Traditionally, one can either use the discriminative approach such as model selection method to rank the competing algorithms, which are k-means, hierarchical clustering and SOM in this study, based upon some given criteria or use the generative approach to model the data generation process. However, obtaining the most appropriate algorithm and parameters from discriminative approach becomes increasingly difficult due to the lack of ground truth in clustering problems and large dimension of data. The generative approach is also difficult when the data generation process is complicated such as the fMRI paradigms used in this study.
To achieve consensus results in a model-free context and provide a more feasible alternative to generative design, we propose a new consensus clustering 28 strategy, called the binarization of consensus partition matrices (Bi-CoPaM). 29 Rather than ranking the different clustering algorithms, Bi-CoPaM integrates the results generated by multiple clustering methods. Moreover, the results are able to be tuned in terms of the consensus level reflecting the quality of the clusters. In this study, we selected three widely used clustering algorithms in neuroimaging, namely the K-means, hierarchical clustering and SOM, 23,30 to be fed into the Bi-CoPaM pipeline and produced consensus results. The Bi-CoPaM pipeline extracts clusters from many (1856) datasets consisting of fMRI trials associated with each subject's listening to hundreds of emotional music clips and seeks the ones characterized by consistently synchronized fMRI signal changes in most of the datasets. By using such a pipeline, we found that several brain structures related to visual, reward, and auditory processing have intrinsic temporal patterns of coherent neuroactivity during affective processing without defining any explicit model.

Participants
Twenty-nine healthy subjects without any hearing, neurological or psychological problem participated in this study (15 females). Among these twenty-nine participants, thirteen were musicians who possessed formal musical training, on average, for 16.1±6 (SD) years. The others were nonmusicians who did not receive any formal musical training but nevertheless had an interest in listening to music consistently. At the time of experiment, musicians reported that they practice their instruments, on average, for 2.5 ± 3.4 (SD) hours per day and nonmusicians declared that they listened to music several hours per day. The study were approved by the ethical committee of the Helsinki University Central Hospital and complied with the Helsinki Declaration. The dataset is a subset of a larger data collection, parts of which have been published in Refs. 31-33.

Music
Four stimulus categories were used in the fMRI experiment. These categories consisted in music that was classified by subjects to be liked and happy (LH), liked and sad (LS), disliked and happy (DH), and disliked and sad (DS). Participants were asked to bring four music pieces for each category, giving 16 pieces in total. (For more details on selecting the appropriate music stimuli, cf. Ref. 31). Two 18-second long music excerpts with 500 ms fade-in and fade-out were selected from each music piece using Adobe Audition and on the basis of a listening test. This yielded 32 excerpts with eight in each stimulus category.

fMRI experiment
The fMRI measurements were conducted with a 3 Tesla scanner (3.0T Signa VH/I General Electric) in the advanced magnetic imaging (AMI) Center at Aalto. Participants rested on the scanner bed in a supine position. Music was presented via fMRI compatible earphone with about 30 dB of gradient noise attenuation. Thirty-three oblique slices covering the whole brain (field of view 20 mm; 64 × 64 matrix; thickness 4 mm; spacing 0 mm) were acquired using an interleaved gradient echo-planar imaging (EPI) sequence with TR equal to 3 s, echo time 32 ms and flip angle 90 • , sensitive to blood-oxygen-level dependent (BOLD) contrast.
During the fMRI experiment, participants listened to the 32 18 s excerpts of music selected as described above. The music excerpts were delivered to the participants via high fidelity MR compatible earphone. Each participant was presented with 32 excerpts for two times in a random order, prompted by a visual cue on the screen (one time it shows like? dislike?, and another time it shows sad? happy?) to keep the participants concentrating on the emotion aspects of the stimuli. Following the end of the stimuli was a 3 s interval without music stimuli during which another cue asked the participants to answer the questions showed on the screen when they listened to the previous music excerpt by pressing a MR compatible button pads with the second and the third fingers of the left or right hand. After the interval a sinusoidal tone indicated the start of next trial. The scanning session lasted for 23 minutes. After a short break, anatomical T1 weighted MR images (field of view 26 mm; 256 × 256 matrix; thickness 1 mm; spacing 0 mm) were acquired for about 9 min. (For more details about this experimental procedure and other scanning technical specifications, cf. Ref. 31)

Data preprocessing and preparation
The whole-brain images were preprocessed by statistic parametric mapping 8 (SPM8, http://www. fil.ion.ucl.ac.uk/spm) and voxel-based morphometry (VBM, http://dbm.neuro.uni-jena.de/vbm/) for SPM. Each participant's images were segmented, realigned, spatially normalized into the Montreal Neurological Institute (MNI) template and spatially smoothed by Gaussian filter with an FWHM of 6 mm (for more details, cf. Ref. 31). For preparing the data for the consensus analysis, we followed two steps using the fMRItoolbox (implemented at the University of Jyväskylä in MATLAB environment): vectorization and segmentation. In vectorization step, the 3D volume data was converted to a vector (228453 × 1) by using a standard brain mask. The above step was applied to every 3D volume scan from each subject and all the scans were combined sequentially, forming the fMRI time series of each subject. According to the order that musical excerpts were played, the whole fMRI time series were segmented into 64 EPI brain volumes, each containing 6 or 7 time points (covering 18 s at a sampling rate Fig. 1. Structure of music excerpts where DH stands for disliked happy music, DS for disliked sad music, LH for liked happy music, and LS for liked sad music. Note that each category has different music excerpts for different subjects. For example, LS 1 listened by subject 1 is different from LS 2 listened by subject 2. Also, one certain category has different music excerpts for one particular subject. For example, in subject 1, the music representing the first DS 1 is not necessarily the same as the last DS 1 . of 3 s), and each corresponding to instances when the participants were listening to music clips. Figure 1 illustrates the order each music excerpt was presented (partial). For one excerpt, we used a 228, 453 × 6 (or 7) matrix with the row corresponding to the voxels and the column corresponding to the time profile for this excerpt. In total, there are 1856 excerpts for all subjects, resulting in 1856 × 228, 453 × 6 (or 7) data points that were used in the following clustering analysis.

Consensus clustering (Bi-CoPaM)
The Bi-CoPaM method is a tunable consensus clustering method which recruits various single clustering algorithms while considering various datasets to identify the subset of objects (e.g. voxels) that are consistently correlated. 29 Given L datasets and C clustering methods, the Bi-CoPaM can be applied by the following four main steps: (i) Partition generation: each of the C clustering methods is applied to each of the L datasets yielding R = C × L partitions. (ii) Relabeling: because clustering is unsupervised, there are no labels for the clusters in the different partitions, i.e. the ith cluster in one partition is not guaranteed to match the ith cluster in another partition. Relabeling reorders, the clusters in the partitions so that they become aligned. Min-min approach was used to perform relabeling. (iii) Fuzzy consensus partition matrix (CoPaM) generation: the relabeled partitions are averaged to produce a fuzzy CoPaM in which each voxel has a fuzzy membership value in each of the clusters based on the number of individual partitions that assigned it to it. (iv) Quenching/Binarization: the fuzzy CoPaM is binarized to produce the final binary partition. Different threshold binarization (DTB) technique assigns a voxel to a cluster if and only if its fuzzy membership value in that cluster is higher than its closest competing cluster fuzzy membership value by the value of the parameter δ. The parameter δ ∈ [0, 1] controls the tightness of the cluster where δ = 0.0 is the least tight (most sparse) and δ = 1.0 is the tightest.
One important feature of the Bi-CoPaM is that the results are tunable. As clusters are tightened, many voxels are unassigned from clusters and are left without being assigned to any other cluster, and smaller and more focused clusters are generated. Some clusters might become completely empty at relatively low δ values while others would resist higher levels of tightening. In general, the role of many of the clusters that become empty at low δ values is to contain and then filter out the majority of the voxels that are irrelevant to the context. On the other hand, the voxels that resist higher δ values are those which have been assigned to the same cluster by higher numbers of individual partitions, and are therefore expected to be more consistently correlated and more relevant to the context.

Degree of freedom in Bi-CoPaM
The Bi-CoPaM method requires presetting the number of clusters (K). Moreover, the tightening parameter δ needs to be optimized. M -N scatter plots technique tackles those issues. Each time that Bi-CoPaM is applied with different K values, DTB binarization is performed with a range of δ values (e.g. from 0.0 to 1.0 with 0.1 steps). All of the individual clusters that appear in the results are scattered on a 2D plot where the horizontal axis (M ) represents the average mean square error (MSE) values of the cluster over all of the datasets, and the vertical axis (N ) represents the logarithm of the number of voxels in where L is the number of datasets, N k is the number of voxels in the kth cluster, C k is the set of voxels in the kth cluster, x l n is the normalized BOLD signal vector of the nth voxel in the cluster from the lth dataset, z l k is the average normalized BOLD signal vector of the voxels in the kth cluster from the lth dataset, and · is the norm of a vector. The objective is to maximize the number of voxels included in the clusters while minimizing the dispersion within the clusters measured by the MSE metric.
As all of the individual clusters generated under all of the different combinations of parameters are scattered on this M -N plot as shown in Fig. 2.
The cluster closest to the top left corner of the plot is selected as the best cluster (blue dot). This cluster is expected to be large with many voxels (high vertical axis value), yet tight with high correlation (low horizontal axis value). The selected cluster and all of the other clusters that have overlaps with it, even by a single voxel, are removed from the plot. Then, the closest remaining cluster to the top left corner of the plot is selected as the second best distinct cluster. The steps of selecting clusters and removing those with overlaps with the selected ones are repeated iteratively up to a preset maximum number of clusters or earlier when the scatter plots are empty. The final number of clusters is not predetermined as it depends on when the plot becomes completely empty. Moreover, the produced clusters are ordered in a descending manner regarding their tightness and size measured by their closeness to the top left corner. Practically, the top selected clusters are of interest to the downstream analysis while most of the low ranked clusters may be considered as containers of irrelevant voxels and are thus discarded.

Test the over representation
The hypergeometric test uses the hypergeometric distribution to calculate the statistical significance of having drawn specific k successes out of n total draws without replacement from the population whose size is N and has K successes. The formula for calculating the probability is where the bracket is the binomial coefficient. 34 Hypergeometric test is often used to identify which sub-population is over or under represented in a sample. In this experiment, we use the hypergeometric cumulative distribution function (CDF) to compute the p value. The hypergeometric CDF is The P in formula (3) is the probability of getting equal or more than k instances of class a if one randomly selects n instances from a population whose size is N and has K class a. We utilized this test to examine the distribution of two different classes of stimulus categories and one class of subject groups, i.e. liked music versus disliked music and happy music versus sad music as well as musician versus nonmusician, compared with their background frequency (N , K in the above formula). We take the null hypothesis to be that different categories or groups have equal effects on the BOLD signal. If a certain stimulus category or subject group is significantly over represented in terms of p value (e.g. p < 0.005), we drew the conclusion that this category or group has effect on the BOLD signal in the corresponding condition.

Clusters generation by Bi-CoPaM
Each of 1856 excerpts data (normalized to 0 mean and unit variance) was clustered by K-means, hierarchical and SOM with k equals to 10, 25, 50, and 100. These clustering results generated by different algorithms with four different cluster numbers were combined using the Bi-CoPaM paradigm and selected by M -N scatter plot, yielding the consensus clustering results. The brain regions within clusters were extracted out by using automated anatomical labeling (AAL) atlas. 35

Filtering
Two types of filtering were used in order to make the results more reliable. Firstly, the original clusters were filtered by discarding those voxels with weak responses (voxels whose time series have a small variance), since the data used to be clustered were normalized and thus lost the signal magnitude information. In this analysis, the voxels whose variance corresponded to less than half of the mean of the variance for all the voxels from one subject were discarded. After repeating this process for all the subjects, we obtained 29 thresholded partitions. Then if more than 70% percent of the subjects showed a strong response at a certain voxel, this voxel was retained for the following analysis. After this step, the clusters only contained voxels having strong BOLD responses. A sensitivity test was also carried out with respect to the filtering parameters described above. Secondly, the resulting clusters from the previous step were filtered by using the hypergeometric (HyperGeo) distribution test that discarded isolated voxels. Voxels covering large connected brain area would feature a very small p value (normally below 0.001 level) while those covering tiny isolated brain structures would result in a relatively high p value (normally above 0.1 level). We chose p equals to 0.001 to distinguish the large clusters from the isolated voxels.

Excerpts pattern analysis
By utilizing the fact that the clustering experiment in this study is based on the BOLD response shapes corresponding to stimuli, we inspected the differences in response shapes to test the hypothesis that different music categories (DH, DS, LH, LS) or different group of participants (musician versus nonmusician) would elicit distinct shapes. Once the final clusters were obtained, the time series of the voxels within each final cluster were averaged for each stimulus, which represents the mean time profile for this stimulus within this cluster, based on which we carried out statistical test on the response shapes. Figure 3 illustrates the excerpt data averaging process.
The averaged excerpt data were further clustered into groups with each one having distinct response shapes. HyperGeo tests were carried out between groups of musicians and nonmusicians, liked versus disliked stimuli, as well as sad versus happy stimuli. If in a particular cluster one stimulus category or participants' group was significantly represented, then this category or group would be declared to tend to have the corresponding response shape.

Analysis on music excerpts causing "strong" response
We designed another analysis pipeline based on the time series of the clustering results, which searches the musical categories that tend to elicit strong BOLD signal response. In the case of a particular subject, for a cluster (e.g. Visual or Reward or Auditory) that contains V voxels and for each voxel having a time series of 450 time points (scans), it  gave V × 450 time points in total. Each of these data values (amplitudes), corresponding to V × 450 time points, is assigned a number from 1 to 4 corresponding to its position in quantile 1 to quantile 4. Figure 4 is the example of visualizing the quantile data of a random cluster. After obtaining the quantile data matrix (V × 450) for a particular subject, we examined the modes of the quantile values within the time windows covering the duration of each stimulus (music excerpt). If in a certain time window, the mode would be 4, then this would be taken to mean that the "strong" response (red colored in Fig. 3) dominates during this stimulus. Hence, we extracted out the excerpt categories eliciting the "strong" responses for that particular subject, and we repeated this for every cluster for that subject, and then for every other subject and all clusters. To compensate for the individual variability in the BOLD response as shown in Fig. 5, we coded subjects' response values in the range from 1 to 4, irrespective of the range of the original responses, and without reference to anybody else's responses. Furthermore, for a particular subject, the score of each excerpt was also computed without reference to any other excerpts of any category for this subject. This ensures that the scores of the excerpts are independent subject-wise, category-wise, and excerpt-wise. Afterwards, we tested the distribution differences among the four stimulus categories and the two participant groups to inspect which type of stimuli or subject group elicited stronger responses than others within the same cluster type (e.g. Visual or Reward or Auditory). Figure 6 demonstrates the whole experiment pipeline in this paper.

Availability
The Bi-CoPAM method and the M-N plots technique are implemented in the R package "UNCLES" available on: http://cran.r-project.org/web/packages/ UNCLES/index.html.

Topology of clusters
We inspected the first 20 clusters (ranked by M -N plots algorithm), as these clusters showed very strong similarity in the response shapes as well as covered large continuous regions, thus complying with expectations based on knowledge of brain physiology. 36 Among these clusters, we found that emotion and sensory-related brain areas were grouped into three clusters separately (see Fig. 7) which correspond well with the literature studying music emotions with the model-based approach. 30,31 Cluster A comprises bilateral visual areas, namely the calcarine fissure and the cuneus. Cluster B comprises bilateral neural structures of the reward system, namely the ventral striatum -extending to the globus pallidus, the thalamus, the amygdala, the orbitofrontal cortex, and the left insula. Cluster C comprises the auditory areas, namely the bilateral superior temporal gyrus, Heschl's gyrus, the left middle temporal gyrus, as well as one region of the somatosensory cortex, namely the right rolandic operculum.

Comparison among multiple clustering algorithms combinations and single algorithm
In order to evaluate the advantage of using the Bi-CoPaM with multiple methods over the Bi-CoPaM with a single method, we compared the clustering results obtained by the following four experiment scenarios: (1) Bi-CoPaM with k-means, (2) hierarchical clustering, (3) SOM, and (4) the combination of all the aforementioned methods. The Venn diagrams (Fig. 8) show the relationships among the results obtained by different clustering algorithm settings for cluster A Visual, cluster B Reward, and cluster C Auditory.
All the four experimental scenarios could detect cluster A Visual, although the size and accurate position of the voxels contained in the cluster are different depending on the paradigm used. For cluster B Reward, the hierarchical clustering failed to    with either KM or SOM method. Cluster C Auditory did not appear in the results by the single SOM algorithm and similarly, the Bi-CoPaM with all the three methods successfully found clusters around the bilateral auditory cortex with 89 voxels not included in the results with either KM or HC method. Thus the current findings show that, of the four experimental scenarios, Bi-CoPaM with all the three methods provides the most complete set of clusters.

Filtering sensitivity test
We ran the filtering (Material and Methods, Sec. 2.8, Filtering) with different parameter combinations (percentage of total data and percentage of mean variance) and inspected the filtered cluster size to see the filtering impact on the results. From Tables 1  to 3, with the number in each cell representing the size of the cluster, we could verify that as long as the filtering parameter combinations were not extremely strict (bottom right corner of each table), the performance remained very stable. The rectangles in all the tables indicate the parameter combination used in this study.

Robustness test against subject functional data variability
Considering that the brain activity to a certain stimulus varied among different individuals, it raises the problem whether the algorithm could perform well regarding the functional data variation among participants. Since it has been demonstrated that Bi-CoPaM with three methods provides the most complete set of clusters, we carried out the test to investigate the robustness of Bi-CoPaM with three clustering methods against data variability on aforementioned three clusters. We generated two groups of subsets for test. One was created by randomly selecting 75% of the musicians (10 out of 13) and nonmusicians (12 out of 16) as well as 75% of the excerpts for each participant, which yields a subset consisting approximately 56% of all the data from the fMRI experiment. The above random selection was repeated 10 times and these 10 subsets formed group A. Similarly, we chose a different ratio of 90% of the musicians (12 out of 13) and nonmusicians (14 out of 16) as well as 90% of the excerpts for each participant, when the data were randomly selected. Repeating the selection with new ratio for 10 times formed group B consisting approximately 80% of all the data. Then we applied Bi-CoPaM with three clustering algorithms on these subsets and recorded the clustering results. Table 4 lists the results from group A and Table 5 lists the results from group B. Note that for the final size of cluster Reward and Auditory, there were unspecified voxels included within AAL atlas which are not reported in Tables 4  and 5. Tables 4 and 5 illustrate that in most of the trials, the three important clusters (Visual, Reward, and Auditory) were always identified despite the different subsets being used. Meanwhile, when the proportion of data used increased, the results became more stable. For example, the Reward cluster was missed three times in the first test reported in Table 4 but was never missed in the second test reported in Table 5. On one hand, this proved our Bi-CoPaM method is robust to variability of participants' data and thus generated reproducible results; on the other hand, it also showed the benefits of using a large number of subjects for more reliable results in datadriven analysis of functional brain imaging data. Table 4. Results of test group A (56% of the full data). Final size is the obtained cluster size by using the whole participants' data. Trial size is the cluster size obtained in different trials. Intersection is the size of the part that the clusters in each trial intersect with the clusters obtained by using the whole participants' data. respect to the contrasts between experimental conditions and groups: liked versus disliked, happy versus sad, musician versus nonmusician. In cluster A Visual, we found the response difference in musician versus nonmusician group as shown in Fig. 9. For the response shape (initially reduces and then steadily rises till the end of the stimuli) shown in the figure, nonmusician is significantly over represented with p value equal to 0.00053. Additionally, no significant difference in the shape of the BOLD responses was found on the contrasts liked versus disliked and happy versus sad in this cluster A. For cluster B Reward and cluster C Auditory, we did not find any significant difference between the response shapes among any of the contrasts.

Music categories and participant groups that cause "strong" BOLD response level
We extracted out the stimuli that caused the "strong" BOLD response (predominantly higher amplitudes) for the clusters A Visual, cluster B Reward and cluster C Auditory. HyperGeo tests were then carried out based on these distributions (Table 6) with respect to contrasts liked versus There are 248 excerpt responses (248 six-points temporal profiles) that were grouped together. Of these similar BOLD response shapes, 87 come from musicians (blue lines) and 161 come from nonmusicians (red lines). The p value for nonmusician in this distribution is 0.00053, indicating nonmusicians are more likely to elicit the above response shape than musicians. disliked, happy versus sad and musician versus nonmusician.
We identified the stimulus categories that are more likely to elicit "strong" responses (higher BOLD response levels) with the Bonferroni-corrected p value 0.017 (original p = 0.05). In the cluster A Visual (calcarine fissure and cuneus), a larger number of excerpts of happy music than sad music elicited stronger responses (p = 0.0042). In the cluster B Reward (striatum, thalamus, amygdala, globus pallidus, and olfactory cortex), a larger number of excerpts of liked music than disliked music elicited stronger responses (p = 0.000083). In the cluster C Auditory (superior and middle temporal gyrus, Heschl's gyrus and Rolandic operculum), a larger number of excerpts of happy music than sad music elicited stronger responses (p = 1.1e−8). It also showed that a larger number of excerpts of disliked music than liked music elicited stronger responses in the brain regions encompassed by cluster C Auditory (p = 1.8e−12). We also tested the stimuli that simultaneously elicited the strong and most similar responses in the brain areas within all the three clusters. Results showed that a larger number of excerpts of happy music than sad music elicited a stronger large-scale brain response network (p = 0.0023).

Discussion
In the current study, we tested a novel data-driven approach aimed at integrating results from several clustering algorithms (rather than applying a single one) to a complex research question related to brain processing of musical emotions. By doing this we obtained several brain regions having consistent and robust pattern of functional connectivity in response to different musical emotions. Based on the clusters obtained by Bi-CoPaM, we found the music categories that elicited strong responses in visual, reward, and auditory brain regions. In addition, we also obtained different BOLD responses between musicians and nonmusicians.
We used the following criteria to ensure the quality of the clustering results. Firstly, we chose three commonly used clustering methods that have been used in the study involving the analysis of fMRI data. 19,24 Secondly, we did not give any spatial information but clustered the fMRI data purely based on its time series. In other words, voxels are clustered based on the their time series profiles and not on their topology in the brain. In addition, the fact that the overall size of the clusters was much larger than the Gaussian spatial smoothing kernel size (about 30 voxels) means it is not likely that this similarity comes from the preprocessing step, which made the results more reliable. with a large number of members showing consistently synchronized activities during most of the experimental conditions. In neuroimaging studies, the signal to noise ratio is often very low, 37,38 making it hard to draw any conclusion based on a single or even several experiments. The paradigm in this study has overcome this issue, since integrating the results from 1856 independent clustering processes means that the results are reproducible: if certain clusters would only appear in few clustering trials, then they would be most likely due to random error or other factors and would not be included in the final results. Remarkably, we obtained these results with a stimulation paradigm that included stimulus sets that varied across participants, and a subjective task related to affective ratings of the stimuli. Hence, the robust findings contrasted the variability of the experimental paradigm.
We further explored how the tunable consensus clustering results from individual clustering methods differ compared with the same from Bi-CoPaM utilizing all three clustering methods. To this end, for each of the three clusters described above and obtained from all three methods, we obtained the clusters generated by k-means and Bi-CoPaM, hierarchical clustering and Bi-CoPaM, as well as SOM and Bi-CoPaM separately. We compared the coherences and differences and found that the three individual methods give different clustering results. For example, hierarchical clustering did not identify the cluster B Reward and SOM did not identify the cluster C Auditory. Yet, by using the same experimental data, the clusters generated by Bi-CoPaM with all three methods not only include the intersections among the results from three individual methods but also areas that were not identified by any individual method on its own. Therefore, the fact that Bi-CoPaM with multiple algorithms detected clusters that could not be identified by a single algorithm, demonstrates the advantages of using Bi-CoPaM with multiple methods over using a single specific method.
Our innovative use of Bi-CoPaM methodology allows us to find clusters including functionally and anatomically related neural networks responding to emotional music (for a broad discussion of the model-based findings obtained with the same dataset and of their relation with the acoustic features present in the music stimuli, cf. Ref. 33).
After the cluster generation and selection, emotionrelated brain structures responsible for rewarding and pleasurable sensations such as the basal ganglia, thalamus, insula, 39 and other areas involved with processing of auditory features such as the Heschl's gyrus, the Rolandic operculum and the superior temporal gyrus 40 were grouped into corresponding clusters separately. One of the most important findings of this study is that, without any predetermined model assigning a value to each stimulus, the Bi-CoPaM methodology was able to obtain a single cluster including the anatomically connected subcortical and cortical structures of the reward circuit, responding selectively to liked, enjoyed music. This is one of the few studies obtaining such finding with a data-driven method. In a recent study using a data-driven network science method to study affective music processing, 16 no reward circuit activity was found. Our study confirms findings on the neural structures related to musical emotions obtained with model-based approaches. 31,39,[41][42][43][44][45] Neural structures of the reward circuit have also been found to be more or less connected only in other functional connectivity analysis studies such as one studying attentiondeficit/hyperactivity disorder (ADHD). 46 In our statistical tests, we also investigated the response shapes elicited by the different stimulus categories (liked, disliked, happy, and sad music) and experimental groups (musicians and nonmusicians). Unlike the traditional statistical tests for fMRI data that compare the response strength differences using a general linear model, 47 which often have been questioned, 48 the clustering analysis of the mean excerpt response shape would distinguish the response shape difference, providing a finer temporal information than comparing the response magnitude level alone. The neurodynamics of functional connections related to affective responses to music have been previously studied only with modelbased approaches and only within selected regions of interests, such as the caudate and the nucleus accumbens. 49,50 With the current results, we replicated those findings without falling into the risks of circular analysis. 8 Moreover, we evidenced a difference between musicians and nonmusicians in the temporal course of the BOLD response for the interconnected cortical areas of cluster A including the calcarine fissure and the cuneus. This finding suggests a larger involvement of visual processes that might be related to imagery or even to a relaxation state in nonmusicians accumulated and achieved as a consequence of listening to 18 s of emotionally-loaded music. A similar result of tightened connectivity in visual clusters was found by Luo et al. 51 with participants lying in the MR scanner at rest with eyes closed, confirming the coupling of these areas during relaxation. Remarkably, while the authors focused the analysis of the differences between musicians and nonmusicians on other regions, a tendency for a larger recruitment of visual clusters seems to be present in nonmusicians, similarly to the current study.
On the other hand, when the same dataset was analyzed using a model-driven approach for evidencing regional activations during music listening, musicians showed larger regional activations in somatomotor areas, such as the precentral and post-central cerebral gyri and the cerebellar declive, in musicians over nonmusicians, whereas the latter group of participants did not show any larger brain activity as compared with musicians. The apparent discrepancy with previous findings obtained with the same dataset relates to the divergent approaches used. In the current study, functional coupling among areas within the same cluster was computed and the temporal dynamics of the BOLD response within each cluster was then compared between musicians and nonmusicians for all the stimulus conditions, whereas in the study by Brattico et al. 33 the overall magnitudes of the BOLD regional responses in the whole brain were compared between groups with the general linear model and post hoc t-tests. Moreover, since in the current study our main goal was to validate a new clustering approach rather than testing the neural adaptations to affective music listening as a consequence of musical training, we did not proceed in studying the differences in the response shape patterns between musicians and nonmusicians for each of the stimulus categories. In our previous study with the general linear model approach, we obtained new evidence for larger activations of reward-related areas in musicians than nonmusicians, in the line of previous findings by Chapin et al. 52 or James et al. 53 As discussed already in Ref. 54 future model-based and data-driven studies should solve the issue on the role of expertise in shaping emotional responses to music in the brain.
Here, we first clustered the data not with one clustering algorithm but with three clustering algorithms independently. Then Bi-CoPaM generates consensus among the three sets of clusters. This takes out the risks of capturing artefacts of an individual clustering algorithm. Furthermore, we analyzed the data not with one set of parameters but with many sets of parameters. For example, Bi-CoPaM explores consensus clusters with different degrees of tightness, which naturally avoids the pitfalls of a single set of clusters found using a single set of parameters. Thus, our study provided a robust solution for obtaining the consistently strong activation patterns in neuroimaging studies of affect.