SYNTHETIC MINORITY OVERSAMPLING TECHNIQUE AND FRACTAL DIMENSION FOR IDENTIFYING MULTIPLE SCLEROSIS

Multiple sclerosis (MS) is a severe brain disease. Early detection can provide timely treatment. Fractal dimension can provide statistical index of pattern changes with scale at a given brain image. In this study, our team used susceptibility weighted imaging technique to obtain 676 MS slices and 880 healthy slices. We used synthetic minority oversampling technique to process the unbalanced dataset. Then, we used Canny edge detector to extract distinguishing edges. The Minkowski–Bouligand dimension was a fractal dimension estimation method and used to extract features from edges. Single hidden layer neural network was used as the classifier. Finally, we proposed a three-segment representation biogeography-based optimization to train the classifier. Our method achieved a sensitivity of 97.78±1.29%, a specificity of 97.82±1.60% and an accuracy of 97.80±1.40%. The proposed method is superior to seven state-of-the-art methods in terms of sensitivity and accuracy.


INTRODUCTION
Among all various brain diseases, multiple sclerosis (MS) [1][2][3] damages the insulating covers of neural cells.The symptoms include mental, psychiatric, and physical problems. 4The symptoms may disappear between attacks. 5Currently, treatments are provided to improve patients' functioning.The life expectancy is about 5-10 years, 6 lower than unaffected people.
Diagnosis of magnetic resonance imaging (MRI) is usually based on scanning of MRI.However, a challenge arises due to the "normal-appearing white matter (WM)" problem, 7 which causes the lesions within the WM to appear the same as healthy WMs. 8 Even experienced neuroradiologist may not perceive the differences.
The computer vision (CV) 9,10 has shown superior power to human beings in many fields.Besides, scholars have already used CV in MS detection.For example, Wang and Wu 11 offered an adaptive chaotic particle swarm optimization (ACPSO) method.Murray, Rodriguez and Pattichis 12 proposed a novel multiscale amplitudemodulation frequency-modulation (MAMFM) method.Nayak et al. 13 presented a new abnormal brain classifier on the basis of two-dimensional discrete wavelet transform (2D-DWT) and random forest (RF).Zhan and Chen 14 employed biorthogonal wavelet transform (BWT) and logistic regression (LR).Yang 15 utilized Hu moment invariant (HMI).Zhou 16 employed stationary wavelet entropy (SWE).Their classifier used decision tree (DT) and support vector machine (SVM).Siddiqui et al. 17 proposed an automated and intelligent medical decision support system (AIMDSS).Phillips et al. 18 combined wavelet entropy (WE).They also presented a novel hybridization method to train the classifier.
Fractal dimension (FD) is a relatively new image feature used in CV, and it has been successfully applied in coal, 19 structure rebuilding, 20 delamination detection, 21 etc.In this study, we make a tentative test to use FD to extract distinguishing features from MS brain images.
Classifier establishment is another important problem.After investigation and analysis to current classifiers, we decided to use single hidden layer neural network (SHLNN) because of the universal approximation theory.The training of classifier may suffer from massive problems: local minimum points, vanishing and exploding gradient, etc.These problems can be easily solved by swarmintelligence training algorithms, which can train weights and biases of different classifiers.For example, Arabasadi et al., 22 used genetic algorithm (GA) to train the neural network.Ji The structure of the remainder is organized below: Section 2 gives the source of materials.Section 3 describes the rationale of the methodology.Section 4 presents the results, and discusses the results.Section 5 concludes the paper.

MATERIALS
We acquired 676 slices from 38 MS patients from the website of University of Cyprus.All those slices contain plaques.Besides, we acquired 880 brain slices from 34 healthy controls in China local hospitals.The details of both MS patient and healthy subject are described in Ref. 16.

Histogram Stretching
Since the dataset sources are from different countries, we employed the histogram stretching (HS) 26 method to ease slice comparability.There are other advanced image enhancement techniques, such as local adaptive histogram equalization method, 27 pulse-coupled neural network method, 28 local graylevel transformation, 29 deep autoencoder method, 30 etc. Nevertheless, the HS is efficient for its simple and easy to implement.Practically, we increased the dynamic range of all images to the same level.where x denotes the horizontal position and y the vertical position.The c min and c max are defined as The first row in Fig. 1 shows two MS slices with five plaques and three plaques, respectively.The last row in Fig. 1 shows two healthy slices.All the slices are extracted along axial direction with different slice index.

Synthetic Minority Oversampling Technique
As the number of healthy slices (880) is slightly larger than the number of MS slices (676), we used Synthetic Minority Oversampling Technique (SMOTE) 31 to increase the number of MS slices.Undersampling method is not used, since it will reduce the size of the dataset.We first take a sample I from the minority class, and select its k nearest neighbors.We select one neighbor J from the k neighbors.We then draw a vector V from I to J.
Finally, a new synthetic sample S is generated as where α is a random number between 0 and 1, obeying the uniform distribution.
Using SMOTE method, we generated 204 synthetic MS samples.In total, we have 880 healthy slices and 880 MS slices.Thus, our dataset contains 1760 image.

Canny Edge Detection on Brain Slices
The FD is conventionally based on the binary image; hence, scholars usually combine edge detection with FD.The edge detection aims to identify the edges where brightness has discontinuity.
There are many edge detection methods.Canny edge detector (CED) 32 is the most effective one, and widely used in FPGA architecture, 33 remote sensing, etc.The procedures of CED are listed below in Table 1.
Figure 2 shows the comparison of different edge detectors over a hand image.Figure 2a shows the original image.Figures 2b-2f provide the edge detection results by Laplacian of Gaussian (LoG), Robert edge detector (RED), Prewitt edge detector (PED), Sobel edge detector (SED) and CED, respectively.Obviously, the CED results contain the most abundant texture information among all edge detectors.Besides, CED did not remove any distinguishing features.

FD Model
FD is widely used for image encoding, 34,35 image segmentation, 36 image classification, 37 cloud computing, 38

Table 1 Procedures of CED.
Step 1 Use Gaussian filter to remove noises and smoothing Step 2 Calculate and locate the intensity gradients Step 3 Use non-maximum suppression to eliminate spurious edges Step 4  Employ double threshold to find potential edges Step 5 Edge tracking based on hysteresis  we can deduce the relationship among those three variables: This scaling rule is coherent with traditional concepts between dimension and geometry. 39,40Afterwards, we extend above formula to general field as Figure 3 shows several samples.The number of sticks M used to cover the shape is M = 1 for Figs.3a-3c.For the second row, M = 2, 4 and 8 for Figs.3d-3f, respectively.We can observe the relationship is in line with Eqs. ( 6) and (7).
Another example is the Koch curve 41 with FD F of 1.2619.Figure 4 shows the Koch curve.Figure 4a shows the curve with iteration I = 1. Figure 4b shows the curve with I = 2 and so forth.

Minkowski-Bouligand Dimension
Section 3.2 shows the theoretical model of FD.
In practice, scholars usually used Minkowski-Bouligand dimension (MBD) 42 to estimate the FD for a given image.MBD was laid over a regular grid over the image and the number of boxes counted overlapped with the edges.
The grid scale G in this study was assigned with the value of 16, 8, 4, 2 and 1 in sequence.Therefore, we shall get a five-element vector as the feature space.Figure 5 shows the hand image covered by boxes with size G of 8.The fractional calculus methods [43][44][45] shall be considered in the future.

SMOTE and FD for Identifying Multiple Sclerosis
Fig. 5 The hand image covered by boxes with size G of 8.

Single Hidden Layer Neural Network
Theoretically, the SHLNN can approximate to any function due to the universal approximation theory: Any SHLNN with a finite number of neurons can approximate continuous functions on compact subsets of Euclidean space, under mild assumptions of the activation functions.This advantage makes us not only choose SHLNN but also other classifiers, including extreme learning machine, 46,47 linear regression classifier, [48][49][50] support vector machine, [51][52][53] DT, 54 Bayesian classifier, 55 etc.There are too many literatures discussing the concept and structure of SHLNN. 56,57In this paper, we shall briefly introduce its fundamentals.Suppose there are d different classes and f -dimensional features.Suppose there are Z labelled samples, we have the z th training data as It has a corresponding label The output of the trained classifier with input of x(z ) is The predicted label of x(z ) is The criterion of the SHLNN is to minimize the mean-squared error between predicted label P (z) (12)

Transform from Training to Optimization
To establish the SLHNN classifier, we need to train the weights and biases.Besides, there is an important parameter -the number of hidden neurons, should be optimized.In this study, we transform the above two problems to an optimization, which can accomplish the above two tasks simultaneously.
The idea is the three-segment representation (TSR) inspired from Ref. 42.For any swarmintelligence algorithm, we divide the solution candidate into three segments.Segment 1 (S1) is encoded as the weights, Segment 2 (S2) is encoded as the biases, and the rest Segment 3 (S3) is encoded as the number of hidden neurons, as shown in Fig. 6.

TSR-BBO
The TSR was then embedded into the BBO.The combined method is named TSR-BBO.The BBO is a hot swarm intelligence algorithm that has been widely used in cost optimization, 58 fruit classification, 59 sonar classification, 60 etc.
In the BBO, each habitat (i.e.solution candidate) is measured by a fitness function named "habitat suitability index (HSI)".The HSI of habitats relies on the variables named as "suitability index variables (SIV)". 61In this study, the SIV is encoded by the three-segmentation representation way.
Three important procedures are included in BBO algorithm.They are (i) migration, (ii) mutation and (iii) elitism.In what follows, we shall introduce them briefly.Solution candidates with larger HSI are more possible to emigrate; in contrast, candidates with smaller HSI are more possible to immigrate.Let s denote the number of species, and S the maximum number of s.Assume the emigration rate is w and immigration rate is v, their relationship is depicted by the following equations: where W denotes the maximum value of w and V the maximum value of v.
Mutation aims to increase the diversity of the ecosystem. 62Suppose n is the mutation rate, N is the maximum mutation rate.We further assume r (s) denotes the solution probability of species s, R is the maximum value of r.We can deduce that In the view of SIV, the mutation was implemented on a rand number generator c, which follows uniform distribution in the range of [0, 1].Suppose a candidate at k th step Q k is Here, Q i,k represents the i th SIV value at k th step, and I the dimension of solution space.The temporary mutated variable M is performed as where U (i ) and L(i) denote the upper bound and lower bound of i th SIV.M is mapped to the range of [L(i ), U (i )] as M is directly assigned to Q i,k if the replaced candidate provides better HSI value than the original candidate.
where a represents the HSI function.
Elitism keeps the best solutions, in order to avoid them getting destructed by the mutation procedure. 63Elitism is performed by taking v = 0 for the specified number of elites.If there is no elitism in the BBO, the optimizer will be difficult to converge.
The TSR-BBO implements the same as BBO when used to train the SHLNN, except that the former can optimize the number of hidden neurons while the latter cannot.

The Whole System
Our proposed MS identification method is based on successful components, including histogram stretch, SMOTE, CED, MBD, SHLNN, TSR, BBO.Its diagram is shown in Fig. 7.The performance was measured by 10 times of 10-fold cross-validation.

EXPERIMENTS, RESULTS
AND DISCUSSIONS

MBD of Brain Slices
The MBD of a MS slice is offered in Fig. 8. Figure 8a gives the original MS slice and Figs.8b-8f offer the MBD estimation with grid scale G of 16, 8, 4, 2 and 1, respectively.The numbers of boxes in these five situations is presented in Table 2.
The MBD in Fig. 8 was performed on the edges extracted by CED.Why CED performs better than other edge detectors?The reason is it contains many successful components: Gaussian filter, intensity gradient detector, non-maximum suppression,  double threshold and edge tracking.All these components help to form the effectiveness of CED.An adaptive threshold CED was proposed by Huo and Wei. 64n the future, we shall test this adaptivethreshold CED method.A shortcoming of MBD is its computational complexity.The reason is the calculation of Euclidean distance, involving floating point number calculation.Rossales and Luppe 65 proposed a dedicated hardware implementation scheme.We shall test this method in our future research.In the future, other advanced dimension estimation methods shall be tested.The generalized M -set 66,67 may give equivalent or better performance than MBD.

Statistical Analysis
In the statistical analysis, each fold contains 88 MS slices and 88 healthy slices.The sensitivities, specificities, accuracies of our method over 10 runs of 10-fold cross validation are presented in Tables 3-5, respectively.We can observe that our method achieved a sensitivity of 97.781.29%, a specificity of 97.821.60%, and an accuracy of 97.801.40%.

Training Algorithm Comparison
We used the proposed TSR-BBO algorithm to train the classifier.Meanwhile, the comparison basis includes GA, 22 PSO, 23 SA, 24 BBO, 25 etc.The maximum iteration number is 1000.The size of population is set to 20.The parameters of these algorithms were obtained by experiences and are listed in Table 6.Crossover Probability = 0.8, Mutation Probability = 0.05 PSO 23 Both Acceleration Factors = 1, Inertia = 0.5, Maximum Velocity = 1, SA 24 Initial Temperature = 100, Final Temperature = 0, Cooling Function = Linear BBO 25 Elite Number  The comparison with terms of accuracy is listed in Table 7.The boxplot is presented in Fig. 9.We can observe that the proposed TSR-BBO yields the largest accuracy of 97.80±1.40%, the second is the BBO 25 that yields an accuracy of 97.28±1.99%, the third is the PSO [23]  that yields an accuracy of 96.86±1.03%.The GA 22 ranks the fourth with an accuracy of 89.39±2.25%.The SA 24 yields the worst accuracy of 83.32±3.35%.

Comparison to State-of-the-Art Methods
Finally, our method "CED + MBD + SHLNN + TSR-BBO" was compared to seven state-of-the-art methods: MAMFM + SVM, 12 2D-DWT + RF, 13 BWT + LR, 14 SWE + DT, 16 SWE + SVM, 16 AIMDSS, 17 and WE. 18he results in Table 8 showed that our method achieved the highest sensitivity and accuracy   among eight algorithms.The highest specificity was achieved by SWE + DT 16 with value of 98.30%, the second highest specificity was achieved by BWT + LR 14 with value of 98.25±0.16%,and our method yielded the third highest specificity with value of 97.82±1.60%.Concerning all three measures, our method performed the best among all algorithms.The improvement of our method compared to other algorithms is slight yet obtained by strict crossvalidation experiment.

CONCLUSIONS
Our team has raised a novel MS slice identification method.We processed the unbalanced dataset using SMOTE.Then, we used MBD to extract features, and proposed TSR-BBO method to train the classifier.
In the future, we shall try to collect more MS patients than used in this study.Advanced classifiers, such as convolutional neural network, stacked sparse autoencoder, may be used.Besides, we may use multi-atlas method 68,69 to perform a brain segmentation method.

4
etc. Assume F denotes the FD, S the scaling factor, M the number of sticks for measuring, 1740010-Fractals 2017.25.Downloaded from www.worldscientific.comby UNIVERSITY OF LEICESTER on 10/17/18.Re-use and distribution is strictly not permitted, except for Open Access articles.

Fig. 7 8
Fig. 7 Flowchart of our proposed MS identification system.