A Kernel Bayesian Adaptive Resonance Theory with a Topological Structure

,


Introduction
In the recent Internet of Things (IoT) technology, massive amounts and different types of information are generated at any moment. However, it is difficult to extract useful information from this huge amount of data from various sources. In general, cluster analysis is one of the common approaches in several research fields such as statistics, machine learning and pattern recognition for extracting the hidden relationships from such a huge amount of data. kmeans 1 and Expectation-Maximization (EM) algorithm 2 are typical types of unsupervised cluster learning algorithms. The Self-Organizing Map (SOM) 3 is another type of clustering which has a topological structure to visualize the structure of a data point. However, the k-means and the EM algorithm can only organize a predefined number of clusters, and an SOM tries to organize it by a single network even if there are multiple clusters. To handle clustering algorithms more adaptively, several types of growing networks have been introduced. Growing Cell Structure (GCS) 4 and Growing Neural Gas (GNG) 5 are significant models of growing networks which insert a new node to the region which has a maximum error. The Self-Organizing Incremental Neural Network (SOINN) 6 has successfully integrated the features of SOM and GNG.
In general, the incremental neural networks have the trade-off between the catastrophic forgetting and the ability to incrementally learn new knowledge, i.e. "plasticity-stability dilemma". 7 Plasticity can be achieved with incremental learning algorithms such as GNG and SOINN. However, these models cannot maintain the stability because they permanently insert new nodes at locations with high errors. In order to solve the plasticity-stability dilemma, ART 8 has been introduced. Fuzzy ART (FA), 9 Bayesian ART (BA) 10 and Kernel BA (KBA) 11 are considered as fundamental models in the ART family. TopoART 12 is an online hierarchical self-organizing incremental clustering algorithm which is based on FA. Although there are various advantages of TopoART, a major issue associated with the FA learning process is its sensitivity to statistical overlapping between the generated categories. 13 This sensitivity issue results in category proliferation (i.e. disordered generation of categories), which leads to a high computational cost, and reduction in clustering ability. One of the successful approaches to tackle the category proliferation problem and to improve clustering ability is the integration of Bayes' theorem to the ART architecture, namely BA. The significant properties of Bayes' Theorem-based ART are that the clusters are defined in Gaussian category which allows a category to grow and shrink by limiting a category hypervolume. However, Bayesian computation suffers from expensive computational cost due to a covariance matrix calculation. Thus, if the BA attempts to process a large number of samples with high-dimensional feature space, it is difficult to maintain feasible computational time, and the likelihood calculation is highly unstable, and global convergence is difficult to achieve. 14 KBA applies KBR 15 instead of the general Bayes' Theorem in BA, and a Correntropy 16 -based alternative similarity measurement called CIM 17 to solve the primitive issues of Bayesian computation which are described above. However, due to the original idea of ART, any input is considered as useful data for clustering. Therefore, ART-based models are potentially noise-sensitive.
In regard to self-organizing incremental clustering algorithms, such as SOINN and TopoART, the learning algorithm processes each sample as potentially informative and thus produces different cluster structures. This leads to variations in the generalization performance of cluster generation due to the proportion of error changes from one order of the sample presentation to the next one. In addition, the similarity measurement and a local error-based node insertion process are defined based on the Euclidean distance. Thus, the quality of a learned network (such as the distribution of clusters) is highly dependent on the presentation order of sample data in the learning sequence, and its calculation is unstable in the high-dimensional feature space.
In this paper, a new approach for a topological growing network, which is called Topological Kernel Bayesian ART (TKBA), is proposed to solve the primitive issues of TopoART and SOINN-based models. The cluster generation process in TKBA is performed by KBR and CIM. The KBR maintains the fast and stable computation even in a highdimensional space due to a covariance-free Bayesian computation. The CIM is a kernel-based method from the information-theoretic learning perspective which localizes the cross information potential and quantifies the similarity between probability distributions of samples and clusters. Therefore, CIM provides a stable measurement even in a highdimensional space. In TKBA, CIM is utilized for node insertion criteria, which provides the more generalized and stable matching criterion between samples and clusters compared with TopoART in the case of high-dimensional data and noisy environments. Furthermore, CIM is also utilized as a criterion to construct the topology networks, which contributes to the stability of topological networks. Based on the above features of KBR and CIM, TKBA achieves the stable topological cluster network generation and superior noise reduction abilities comparing with TopoART and SOINN-based models.
The main learning algorithm of TKBA is an extension of KBA. 11 Specifically, TKBA consists of the cluster generation process by KBA and a topology construction process. Therefore, TKBA also acquires the features of KBA, i.e., fast computation even in high-dimensional feature spaces, and the cluster generation process has a strong robustness against the influence of the order of given data. These features are further significant elements of the self-organizing ability in the topological models. By introducing the topological structure into KBA, the relationships between the clusters can be clearly understood. Thus, it is possible to develop an effective network construction process. In particular, we propose a new node deletion method based on topological connections and a CIM-based criterion and realize efficient cluster generation and topology construction. As mentioned above, although the learning algorithm in TKBA is similar to KBA, the cluster generation process is different due to the topology construction process. Moreover, the self-organization ability of TKBA is not only improved compared with KBA, but it is also successfully extended from functional perspectives.
This paper is divided as follows: Section 2 presents a literature review for conventional unsupervised clustering algorithms. Section 3 describes the details of TKBA. Section 4 presents simulation experiments in terms of a self-organizing capability with synthetic data, and a classification ability with real-world datasets. Concluding remarks are presented in Sec. 5.

Literature Review
Information modeling is based on computational intelligence attention from scientists and engineers. Typical supervised learning models are studied in several research fields due to their superior classification performance, 18-20 such as Support Vector Machine (SVM) 21 and Extreme Learning Machine (ELM). 22 Furthermore, recent innovative technologies representing deep learning 23 and their applications have been actively developed. [24][25][26][27][28][29][30] Besides, information modeling is applied to communication technologies and optimization methods. [31][32][33][34] However, these methods require well-structured labeled information for their learning algorithm. In contrast, unsupervised learning plays an important role in the information modeling with unlabeled information. Especially, it is regarded that the significance of cluster analysis will further increase in the growing IoT society, where a huge volume of information and data without structured labels are generated day by day. 35 Typical types of unsupervised clustering algorithms are k-means 1 and the EM algorithm. 2 SOM 3 is one of the representative topological clustering algorithms which is utilized for data visualization. However, k-means and the EM algorithm are able to organize only a predefined number of clusters. In addition, SOM tries to organize the multiple separated clusters by a single network. GCS 4 and GNG 5 have successfully solved the problem in SOM by implementing a growing network architecture. Due to their superior performance, GNG-based clustering models have been integrated with several algorithms such as semi-supervised learning 36 and hierarchical clustering. 37 One of the problems of GNG is the excessive cluster creation. In response to this issue, Grow When Required (GWR) 38 showed an effective solution. That is, GWR inserts nodes whenever the state of the current network does not sufficiently match the input data. Another noteworthy approach is integrating GNG with CIM 16 which is called GNG-CIM. 39 CIM is a Correntropy 16 -based similarity measurement which is introduced from the information theoretic learning perspective. CIM can be considered as an alternative criterion to the Euclidean distance-based one. Adapting the clustering algorithm in the CIM sense is equivalent to reducing the localized cross information potential, and the theoretic information function that quantifies the similarity between two probability distributions based on the Gaussian kernel function. The insertion of new clusters is determined by the distribution density of existing clusters, therefore the excessive cluster insertion is suppressed.
A model combining the characteristics of SOM and GNG has also been proposed, which is known as SOINN. 6 SOINN is able to grow incrementally and to accommodate input patterns of (non)stationary data distributions, which are processed on the Euclidean distance-based similarity measurement and errorbased node insertion criterion. SOINN has a great noise reduction ability due to its two layers architecture. However, the weights of the neurons are not stable and each layer has a high number of relevant parameters. To tackle this problem, several types of SOINN-based models have been introduced. Enhanced SOINN (ESOINN) 40 simplifies the structure of SOINN to a single-layer model and reduces the number of predefined parameters. Furthermore, ESOINN has the capability that classes with a high-density overlap can be separated based on their distribution. Adjusted SOINN (ASOINN) 41 further reduces the number of parameters from ESOINN. Although these models maintain fast computation, the learned network distributions are unstable with respect to high-dimensional data because of its Euclidean distance-based network construction process. Nakamura and Hasegawa introduced the SOINN with Kernel Density Estimation (KDESOINN) 42 which can estimate the probability density function based on learned node distributions underlying the given information. Although KDES-OINN achieves the good noise reduction and selforganizing capabilities, if the input data has a high dimensionality, the performance of KDESOINN is adversarily affected by the kernel density estimation (KDE) process.
GNG and SOINN achieve the ability of a learning algorithm that incorporates new knowledge into its growing network representation. Due to its adaptability and applicability, several types of research have applied the topological network to practical applications. 28,43 However, since these models insert new nodes permanently into the network, they have a potential to cause catastrophic forgetting. This trade-off is called the plasticity-stability dilemma. ART 8 is one of the representative approaches to solve the plasticity-stability dilemma. ART performs topdown learning expectations that are matched with bottom-up input. FA 9 is considered to be the leading incremental clustering algorithm in ART-based neural networks. TopoART 12 is an on-line hierarchical self-organizing incremental clustering algorithm which is based on FA. TopoART combines the advantages of ART and topology network learning, that enables the match-based fast stable learning and intrinsic self-organization which are inspired by the functions of the human brain. TopoART consists of two FAs which are called TopoARTa and TopoARTb respectively. TopoARTa performs clustering by FA using all the input data and TopoARTb generates clusters using only input data contributing to a cluster generation in TopoARTa. Although there are various advantages of FA, the major issue is its sensitivity to statistical overlap between the generated categories. 13 This sensitivity issue results in category proliferation (i.e. disordered generation of categories), which leads to high computational cost and a reduction in the classification accuracy. Several studies have been introduced to tackle the category proliferation problem and to improve the clustering and classification abilities. 44 One of the successful approaches is integrating the Bayes' Theorem to the ART architecture, namely BA 10 and KBA. 11 The significant properties of Bayes's Theorem-based ART are that the clusters are defined as Gaussian categories, which allows a category to grow and shrink by limiting a category hypervolume. Furthermore, the cluster activation with respect to ART learning is performed probabilistically and thus enables probabilistic inference using all the associated clusters from the given information.

Principle of Topological Kernel Bayesian ART
In this section, firstly the fundamentals of KBR and CIM are described, then the learning algorithm of KBA is briefly introduced. Finally, a topology construction process of TKBA is presented.

Kernel Bayes' Rule
KBR has been introduced by Fukumizu et al. 15 as a nonparametric kernel method for realizing Bayes' rule. In KBR, a prior probability, a posterior probability, and a likelihood are all expressed as kernel means and covariance operators, which are learned nonparametrically in the Reproducing Kernel Hilbert Space (RKHS) H. Note that, in this case, the term "nonparametric" means that the probability density function estimation is data dependent and not determined apriori. Furthermore, the calculation of posterior probability is performed by straightforward matrix operations on RKHS, which means that the computational cost is proportional to the sample dimensions. 45 Let us suppose that the kernel mean of posterior probability is calculated by observed samples X = (x 1 , x 2 , . . . , x L )(x l ∈ d ) under the cluster distribution {(P (y k ) , y k )} K k=1 (P (y k ) ∈ measure space S, and y k ∈ measure space R), where y denotes the existing cluster, and P (y k ) denotes the prior probability of cluster y k . Here, the cluster posterior probabilitŷ P (y k |x l ) is defined as follows: whereB Q S|r (r = 1, 2, . . . , K) is a Gram matrix representation of the kernel mean of the posterior probability, which is calculated as follows: Equation (2) is defined by the following equations: where K denotes the positive definite kernels. In this paper, the Gaussian kernel function is utilized as the positive definite kernel K, i.e. exp(− x−y 2 /(2σ 2 kbr )) to maintain the fast convergence. 46 Here, the kernel bandwidth σ kbr effects the sensitivity of KBR. K and δ K denote the regularization constants and γ represents the weighting factor. In the original paper of KBR, 15 these factors are set as K = 0.01/K, δ K = 2 K , and γ = 1.0, respectively, where K denotes the number of clusters in the network. G S and G R denote Gram matrices which have symmetric and positive semi-definite properties. Let Ω = (v 1 , . . . , v n ) be a set of arbitrary vectors, then the Gram Matrix G is defined as follows: Gram matrices G S and G R in KBR are defined by Eq. (7) with P (y k ) and y k , respectively. The details of the derivation of Eqs. (3)-(6) are referred in Fukumizu et al. (2013). 15 The summary of KBR is presented in Algorithm 1.
1: Compute Gram matrices G S and G R using Eq. (7).

Correntropy-Induced Metric
Correntropy, 16 which is a generalized similarity measure between two sample vectors, is defined as follows: where X = (x 1 , x 2 , . . . , x L ) and Y = (y 1 , y 2 , . . . , y K ) are arbitrary sample vectors. κ σ is a kernel function that satisfies the Mercer's Theorem. 21 It induces RKHS and thus it can be defined as the dot product of the two random variables in the feature space as follows: where φ denotes a nonlinear mapping from the input space to the feature space based on inner product operation as follows: In practical terms, correntropy can be described by the following equation due to the finite number of data L available: Correntropy is able to induce a metric, which is called CIM. CIM can be quantified as the similarity between two probability distributions as follows: It can be considered that CIM is a kernel-based similarity measurement which localizes the cross information potential and quantifies the similarity between the probability distributions of samples. In this paper, the Gaussian kernel function is utilized as the kernel function κ in Eq. (12), i.e. exp(− x − y 2 /(2σ 2 cim )). Here, the kernel bandwidth σ cim effects the sensitivity of CIM.

Topological Kernel Bayesian ART
The learning algorithm of TKBA is an extension of KBA. 11 TKBA consists of the learning algorithm of KBA and a topology construction process between clusters which are generated by KBA. The summary of the learning algorithm of KBA is presented in Algorithm 2. The learning algorithm of KBA is divided into three processes, namely (i) cluster choice, (ii) cluster match, and (iii) cluster learning, which are indicated in Algorithm 2 in lines 1-6, lines 7-17, and lines 9-10, respectively. TKBA integrates the topology construction process into KBA as a new process as follows: Topology construction process defines topological connections between clusters represented of connected clusters that have similar/related information. In TKBA, the topology construction process consists of a cluster deletion and edge creation/deletion as follows: Once a cluster match occurs and the 2nd winner cluster also satisfies the match criterion as V J l ≤ V MAX , the 1st and 2nd winner clusters are connected by an edge. The 1st and 2nd winner clusters are determined by J l = arg max k∈K [P (y k | x l ) KBR ] in the cluster choice. Unlike GNG, the edges in TKBA do not have an age information.
For the sake of a stable topology construction, cluster deletion (b) and edge deletion (c) processes are performed by a predefined cycle λ.

(b) Cluster Deletion
As a cluster deletion criterion, the similarity between a cluster y k which satisfies the cluster match (i.e. V J l ≤ V MAX ) and the sample x l is defined by CIM as an error E cim ([0, 1]) in the cluster. For the error E cim , the initial value of the error E cim = 1 is given to a newly generated cluster, and E cim becomes zero when V J l = 0. The error E cim is updated by the following process during the cluster match: where V J l is calculated by CIM. If the error E cim is large, it means that there is no sample near the cluster. In the proposed model, the cluster deletion is executed if the cluster has an error E cim larger than the square of V MAX , namely: Once the above condition is fulfilled, the cluster y k is removed from the clusters Y . Furthermore, the clusters, in regions where the sample x l input is infrequent, tend to be isolated from other clusters (i.e. there is no edge). It is considered that the isolated clusters are generated by noise samples. Therefore, an isolated cluster y k , which does not have an emanating edge anymore, is also removed from the clusters Y .
(c) Edge Deletion TKBA does not have an age factor for the edges, therefore, the edge deletion is performed only when an edge intersection is detected. In TKBA, the intersection edge is detected by the cross product-based detection algorithm, 47 which is applied to all the clusters that have an emanating edge. If an intersection is detected, the edge which has a maximum CIM is removed. Although the cross product-based detection algorithm 48,49 is mainly effective for the intersection in a three-dimensional space, the edge intersection is a significantly infrequent event in a high-dimensional space. Thus, TKBA utilizes the inner product-based intersection detection method which is detailed in Cormen 47 to reduce the computational load.
The summary of the learning sequence of TKBA is presented in Algorithm 3. Figure 1 shows the examples of self-organizing results by KBA and TKBA. As described in Sec. 1, any input is considered as useful data for clustering due to the original idea of ART. Thus, KBA generates a lot of unnecessary clusters by the uniform Gaussian noise. In contrast, TKBA successfully organizes a concentric structure by the clearly separated topological networks. From this simple example, it can be seen that TKBA achieves to improve the selforganizing capability and to enhance the functionality of KBA by the topology construction process proposed in this section.
In case of utilizing the learned clusters Y for a classification task, an input sample is classified into a cluster y k which has a minimum CIM between , and the parameter of clusters Y = (y 1 , y 2 , . . . , y K ). Ensure: the nearest cluster y k corresponding to the given sample x t 1: Input a vector x t to a topological network. 2: Compute CIM between x t and Y . 3: Select a cluster y k which has a minimum CIM. 4: if t < T then 5: Continue from step 1 with t ← t + 1 6: end if the input sample and clusters. This procedure is summarized in Algorithm 4. In addition, by averaging the counts of label information of clusters in a topological network during a learning sequence, the label attribute of the topological network can be determined, and it will be utilized for supervised learning.

Computational Complexity
In this section, the complexity of typical unsupervised clustering algorithms is discussed. In addition, the complexity of representative supervised classification algorithms is also presented as a reference.
The complexity of an algorithm can generally be divided into two factors, i.e. (i) a computational complexity which deals with how long the algorithm is executed, and (ii) a space complexity which focuses on how much memory is used by an algorithm. In this paper, we focus on the computational complexity. It is represented by a Big-O notation with the symbols N , D, C, I, and L which denote the number of samples, the dimensions of the sample, the number of clusters, the number of iterations, and the size of batch samples, respectively.
As mentioned in Sec. 2, ASOINN is an extension model of ESOINN. It can be considered that the computational complexity of ASOINN shows to be equivalent to ESOINN, i.e. O(N C 2 I), which is introduced in Asadi et al. 50  The summary of the computational complexities is shown in Table 1 5 5 KDESOINN, ASOINN, and TopoART, ASOINN shows the lowest computational complexity. Being ART-based algorithms, TKBA and TopoART take an additional computational cost. However, TKBA is composed by a single layer, thus the computational complexity is lower than that of TopoART. Compared to TKBA and KDESOINN, the multipliers of N and C of KDESOINN are larger than of TKBA. Therefore, it is considered that TKBA has a lower computational complexity than KDESOINN.

Simulation Experiments
This section presents the simulation experiments for evaluating the self-organizing and classification abilities of TKBA. In this paper, SOINN-based models, i.e. KDESOINN 42 and ASOINN, 41

Effect of Parameters in TKBA
Firstly, to provide a better understanding of parameters in TKBA, the self-organizing results with several parameter settings are presented. The dataset for this demonstration consists of 2D synthetic data as shown in Fig. 2. The dataset is divided into six distributions with 15k data samples each as A, B, C, D, E and F. Here, A and B satisfy 2D Gaussian distribution. C and D are concentric-ring distributions. E and F are sinusoidal distributions. In this experiment, we utilize the dataset shown in Fig. 2(a). The other datasets (Figs. 2(b)-2(d)) are utilized in the next section. TKBA has four significant parameters that have a strong influence on its performance. The basic parameters of TKBA are defined as shown in Table 2. Under this setting, the self-organizing result to the dataset in Fig. 2(a) is depicted in Fig. 3(b). The rest of the results in Fig. 3 is obtained if the parameter σ cim = 0.015, 0.050, 0.100 and 0.200. In Fig. 3, a red circle denotes generated clusters and a black line represents edges between clusters. Similar to Fig. 3, Figs. 4-6 show the effect of parameters σ kbr , V MAX , and λ, respectively. Here, same as in Fig. 3, the red circle denotes generated clusters and the black line represents edges between clusters. Based on Fig. 3, when the parameter σ cim increases, the number of clusters decreases and the distance between clusters increases. In addition, each data distribution tends to combine itself. Therefore, σ cim has a strong influence on the number of clusters. In contrast, as shown in Fig. 4, the influence of the parameter σ kbr to the number of clusters is quite low. Focusing on Fig. 5, V MAX has an influence on the number of clusters, however, its impact is not so high compared to σ cim . Furthermore, since V MAX has a role as a matching criterion, it is preferable that its value is fixed. The parameter λ also affects on the number of clusters, however, it turns out that its impact decreases if the λ has a larger value.
From the above discussion, although TKBA has four significant parameters, the parameter σ cim plays an important role in the performance of TKBA. Therefore, we focus only on the parameter σ cim in the following section.

Self-organizing Ability
The self-organizing ability is examined with the same 2D synthetic data used in the previous section (Fig. 2). In this simulation, to evaluate the robustness of models, uniform random noise is added to the original dataset. In this research, a uniformly distributed random data is added to the data space as   Table 2.   Table 2.  Table 2.  Table 2.
"noise" for distributions A, B, C, D, E, and F. Therefore, the noise in this experiment does not organize another distribution, and it is considered that the noise belongs to one of the six distributions. The simulation experiments are conducted in two environments, i.e. stationary and nonstationary environments. In the stationary environment, the data samples are randomly selected from the whole dataset.
In the nonstationary environment, the distributions of A-F are sequentially exposed to the network. In the experiment, each data sample is exposed to the network only once. The parameter settings in each model for the selforganizing ability are summarized in Tables 2 and 3. The parameters in each model are tuned by empirically achieving the best Normalized Mutual Information (NMI), 56 Micro and Macro F-Scores 57 (i.e. all the results show 1.0) in the case of the dataset as shown in Fig. 7(a). The dataset in Fig. 7(a) does not contain any noise information, and therefore it is easy to set parameters to achieve the best results. The obtained parameters are utilized to problems with noise thereby the robustness of models can be estimated. In regard to KBR, the parameters follow  Fig. 8, as the noise ratio increases, the topological network collapses in KDESOINN. In contrast, in Fig. 7, TKBA shows an outstanding noise reduction ability with a stable network organization. Figures 11-14 show the generated topological networks in TKBA, KDESOINN, ASOINN, and TopoART, respectively under the nonstationary environment. Similar to that, under the stationary environment, ASOINN and TopoART suffer from sensitivity to noise in the topology construction. In Fig. 12, KDESOINN generates a topological network without collapsing as in Fig. 8(d). However, the clusters are connected between different distributions of the dataset. In contrast, same as that under stationary environment, TKBA shows a superior   self-organizing ability with a strong noise reduction performance. Figure 15 shows examples of a failure topology construction of TKBA with 50% noise added. As the noise ratio rises, the distribution of the generated clusters in TKBA becomes unstable. In addition, similar to KDESOINN and ASOINN, TKBA also tends to combine clusters which have different distributions, or generates useless clusters.
Compared with the generated topological networks in stationary and nonstationary environments, SOINN-based models, namely ASOINN and KDES-OINN, generate networks that have huge gaps (in terms of the density of clusters) depending on the input order of data samples. On the other hand, ART-based models like TKBA and TopoART generate a similar topology in each environment. It can be seen that the ART-based approach has the superiority in the robustness of the self-organizing ability for the input order of sample data.
Secondly, the quality of topological networks is discussed from the statistical perspective. In this paper, the quality of a network means how well data can be represented by the generated networks. In regards to the quality assessment of the learned topological network which is generated from the dataset containing noise, the data samples without noise are exposed to the network, and the nearest class from each data sample is searched to calculate NMI and Micro and Macro F-Scores. In this experiment, we assigned the label information to each data distribution A to F as a class 1-6, respectively for calculating NMI, Micro, and Macro F-Scores. In addition, to reduce the bias resulting from the random sampling of training data, 10-fold cross-validation is utilized. Moreover, all the experiments are conducted in 20 trials to obtain the consistent averaging results. In addition, the Wilcoxon signed-rank test 58 is employed to determine whether one algorithm has a statistical significance difference and the null hypothesis is rejected at the significant level of 0.05.
The results are summarized in Table 4. The generated topological networks are clearly separated into six distributions as shown in Figs. 7 and 11. TKBA shows the highest score in each measurement. Furthermore, the standard deviation of each score indicates an outstanding stability of the self-organizing capability of TKBA. The superiority to the stability of ART-based models is also shown by the number of clusters and classes, which have similar results in stationary and nonstationary environments.
From the above results, TKBA has robust selforganizing abilities both in noisy environments and the influence of the input order of sample data.

Classification Ability
This section presents the comparison of TKBA, KDESOINN, ASOINN, and TopoART in terms of the classification performance, the robustness, and the processing time per sample of each model by utilizing 12 real-world datasets from the UCI repository of machine learning databases. 59 The datasets in this experiment are summarized in Table 5. In addition, the results of k-means 1 are shown as a standard unsupervised classification algorithm. Note that the value of k in k-means is set as the number    of classes in each individual UCI dataset. Furthermore, the results of typical supervised classification algorithms, i.e. kernel SVM (kSVM) 60 and ELM, 22 are also shown.
The parameters of each model are summarized in Table 6. The rest of the parameters are the same as those in Tables 2 and 3. In addition, the parameters of KBR are also the same as those in Sec. 4.1. The parameters in Table 6 are tuned by preliminary experiments utilizing the fewer number of samples in each dataset. We repeatedly calculate with different parameter settings and adopt the parameters that obtained the highest NMI. Since parameter settings are changed in an arbitrarily fixed range when changing the parameter condition, there is a possibility of finding an even better parameter setting by sophisticated optimization algorithms.
Similar to the self-organizing ability assessment, 10-fold cross-validation method is utilized with 20 trials for obtaining consistent averaging results. In addition, during the network learning sequence, each data sample is shown 100 times. Although several  datasets provide training and test data independently, both data are integrated randomly and crossvalidation is applied to the entire data. Furthermore, the Wilcoxon signed-rank test is employed to determine whether one algorithm has a statistically significant difference and the null hypothesis is rejected at a significant level of 0.05. Table 7 shows the results of classification performance. Focusing on ASOINN, it can be seen that the classifier generation process is unstable due to the fact that the standard deviation is larger than that of other models, even if each measurement result generally shows a high score. KDESOINN shows smaller standard deviations than those of ASOINN. However, the model is likely to generate excessive clusters in the network. Regards to TopoART, especially in the case of high-dimensional data, the model has the tendency to generate excessive classes much more than other models, due to the shortcomings of FA. In contrast, TKBA shows higher measurement scores than other models while maintaining small standard deviations and a smaller number of clusters and classes. In addition, the values of the Micro and Macro F-score show similar results. It can be stated that the model has a higher stability than comparison models for a class-imbalance problem, except for the Poker Hand dataset which is however an imbalanced dataset.
To compare the robustness and adaptability of the models, Adjusted Rand Index (ARI) 61 is considered. Let n ij be the number of samples that are in both class u i and cluster v j . Let n i· and n ·j be the number of samples in class u i and cluster v j , respectively. The ARI is defined as follows: In general, a higher ARI shows that the model has a better performance.
In Fig. 16, it is difficult for TopoART to handle high-dimensional samples. In contrast, TKBA shows the highest ARI for the majority of datasets. Therefore, it can be seen that TKBA has superior robustness and adaptability to different types of problems.
In Fig. 17, the processing time per sample for each model is summarized. In general, as the number of clusters and classes of the model and the dimensionality of samples increase, the processing time increases. The processing time of TopoART for the Poker Hand dataset shows the longest processing time. This is because TopoART has generated excessive clusters. From Fig. 17, it can be seen that TKBA is able to maintain a fast processing even in case of high-dimensional samples.
In summary, from the results in this section, it can be stated that TKBA is able to perform the fast and stable computation with outstanding noise reduction capability even in a high-dimensional space. Furthermore, TKBA shows superior robustness for different types of tasks. Thus, TKBA is a successful approach for enhancing the capabilities of the topological growing network algorithms.

Conclusions
In this paper, a new unsupervised topological clustering algorithm is introduced by combining the ART-based topological growing network and the kernel framework. TKBA successfully integrates the Bayesian kernel approach, the generalized similarity measurement and the topology construction process in the ART framework.
The results of the self-organizing experiments with the synthetic dataset showed that TKBA is able to perform noise reduction and stable topology construction. In addition, classification experiments with real-world datasets have revealed that TKBA has the capability to deal with several types of data while maintaining superior robustness, adaptability and fast computation. In summary, the typical problems of self-organizing growing network models can be dealt with by TKBA as follows: -The self-organizing ability of TKBA has a robustness for different order of given data. -TKBA achieves fast computation in the highdimensional space due to its kernel framework. -TKBA acquires high noise reduction capability due to CIM.
Improving the interpretability and selectivity of a huge amount of information makes it possible to further extend a scope of availability for big data of the IoT society. 62 The interpretability and selectivity of information could be enhanced by compressing/expanding the information to an arbitrary granularity. One of the approaches to realize it is to apply a hierarchical architecture to a model. Thus, as future work, a hierarchical architecture will be introduced to TKBA for further improvement of its performance in terms of functionality and ability. From the functional perspective, TKBA is able to compress/expand input information in a more flexible way. The noise reduction and stable self-organizing capabilities are one of the expected ability improvements. As an algorithmic improvement, an adaptive parameter optimization should be considered to reduce the number of parameters that need to be adjusted.