Unsupervised calibration for noninvasive glucose-monitoring devices using mid-infrared spectroscopy

Ryosuke Kasahara*†, Saiko Kino, Shunsuke Soyama and Yuji Matsuura†‡ *Ricoh Institute of Information and Communication Technology Research and Development Division, Ricoh Company 2-7-1 Izumi, Ebina 243-0460, Japan Graduate School of Engineering, Tohoku University 6-6-05 Aoba, Sendai 980-8579, Japan Graduate School of Biomedical Engineering Tohoku University, 6-6-05 Aoba, Sendai 980-8579, Japan ryohsuke.kasahara@jp.ricoh.com


Introduction
In recent years, the incidence of diabetes has increased worldwide, thus, increasing the market demand for noninvasive blood glucose-monitoring technologies. Various methods have been proposed for noninvasive blood glucose measurements, including near-infrared sensing, 1-3 mid-infrared sensing, 4 Raman spectroscopy, 5 and photoacoustics. 6,7 Of these options, the mid-infrared light spectrum o®ers good detection accuracy because glucose absorbs light particularly well at these wavelengths. Practical applications of noninvasive blood glucose measurement technologies are limited by the measurement accuracy 8 and require invasive calibration steps for practical use.
Calibration is vital to ensure that glucose measurements are robust against individual di®erences and that the noninvasive measurement is maximally correlated with direct glucose concentration readings. [9][10][11][12] In these calibration routines, blood sampling is indispensable to prepare training data; to obtain highly accurate measurements, the blood glucose level must be invasively measured at least once. We propose a novel method to calibrate noninvasive glucose measurements with spectral data alone, which could lead to a truly noninvasive blood glucose-monitoring system.
Machine-learning techniques have been used to analyze a wide range of biomedical data for applications, including those for spectral analysis and disease prediction. [13][14][15] Previous studies have focused on creating regression models using supervised learning. Training data that are labeled with reference measurements are required for normal supervised learning. To avoid the need for a labeled training dataset, we applied domain adaptation (DA) using a deep neural network (NN) to learn a model. When we have a model created from original training data, DA can be used to adapt this model to di®erent training data with no label information having a distribution that is di®erent from that of the original training data. DA is used, for example, when adapting a person's spam¯lter model to the contents of other people's mail with di®erent content distributions. 16

Materials and Methods
Previously, a method that measures light spectrum absorbance in the oral mucosa has been proposed. 17 We have developed a technique that transmits optical signals using an attenuated total re°ection (ATR) prism and a hollow optical¯ber 18 that e±ciently propagates mid-infrared light. 19 We previously reported the accuracy of noninvasive blood glucose level measurements that were obtained using this device 20,21 ; Fig. 1 shows an outline of the measurement system. We measured the absorbance of the oral mucosa using an ATR prism sandwiched in the patient's inner lip. The ATR prism is made of ZnS. Two Fourier-transform infrared spectroscopy (FTIR) devices (Tensor 27, Bruker, Billerica, Massachusetts, US and Vertex 70, Bruker) are used as mid-infrared spectrometers. Measurements are recorded using ATR prisms (prism 1 and prism 2) of di®erent thicknesses in order to check whether the prediction model of blood glucose level depends on the design of the prism.
Tissues are generally considered as turbid media with high scattering and absorption properties, and in the mid-infrared region, light only penetrates shallow tissue samples. Because the experimental setup used in this study employed an ATR prism as the probe, the light transmitted from the tissue was not used, and only a few microns which the light of the prism that bounced o® the re°ecting surface were measured. Since the ATR prism exhibited a large number of surface re°ections, it was possible to sense the surface conditions with a good signalto-noise ratio (SNR). The oral mucosa was found to be a suitable tissue for ATR-based sensing because it did not include an epidermis layer. In this study, we investigated the e®ect of the interstitial°uid on the glucose levels via the vessel wall by applying this method.
To collect reference data, we used two types of self-administering glucose-monitoring devices (Medisafe Mini, Terumo, Tokyo, Japan and One-Touch UltraView, Johnson and Johnson, New Brunswick, New Jersey, US). As the blood glucose levels measured by the two reference devices for the same blood sample di®ered somewhat, the values measured by the Medisafe Mini were corrected with a linear equation so they matched those from the OneTouch UltraView. Blood sampling and measurements of blood glucose levels were initially performed every 10 min (this was done shortly after the meals when signi¯cant changes were still likely occurring), and thereafter, the time interval was increased. The analyses were conducted for approximately 3 h until the blood glucose levels stabilized. In addition, spectral measurements were performed. However, the timing of blood sampling and the number and timing of spectral measurements did not always match, so the blood glucose levels at the time of the spectral measurements were obtained by linear interpolation of the glucose levels obtained from the analysis of the blood samples. We collected blood samples and simultaneously recorded the absorption spectrum of the oral mucosa. Each dataset was collated from a series of measurements taken on a single day.
Two datasets, the characteristics of which are listed in Table 1, were prepared from the measured data. The labeled dataset, containing 131 data points from 13 measurement series, was constructed by requiring the subject to eat a variety of meals before the measurements. The unlabeled dataset contains 414 data points from 18 measurement series. The subject in the labeled dataset was a healthy male in his mid-20s. The subjects in the unlabeled dataset were four healthy men and a healthy woman with their age ranging from 30 to 50. Only one person participated in a series. For this dataset, the measurements were performed after eating various meals or drinking a glucose solution (75 g of glucose dissolved in 150 mL of water). The unlabeled dataset also includes data acquired from di®erent ATR prisms and di®erent FTIR devices. Parentheses in the table indicate the number of the corresponding measurement series. Our protocol was approved by the ethical committee on the Use of Humans as Experimental Subjects of Tohoku University, and informed consent was obtained from all examinees. Table 2 shows the minimum, maximum, and average values (as well as the variance) obtained from the data points for a series of datasets.
When considering a noninvasive glucose measuring device for practical use, we have assumed the following handling procedures for the two datasets: Device manufacturers recorded the labeled dataset to the device before shipment to create predictive models. The user acquired the unlabeled dataset after shipment and predicted the blood sugar levels from the unlabeled dataset.

Calculations
Previously, we found that in the mid-infrared spectrum, even when only three appropriately chosen wavenumbers are used for predicting blood glucose levels, the regression correlations were equal to or higher than the correlations obtained when using more wavenumbers. 22 Partial least squares regression (PLS), 23 support vector machine, 24 and NN models 25,26 have all been proposed as models to predict blood glucose levels from spectrum data. In addition, to avoid the di±culty of labeling a large volume of data, transfer learning is often used to apply learning results from one classi¯cation task to other. Of the transfer learning techniques, DA allows machine-learning models to successfully predict test datasets when the training dataset has di®erent distribution from the test dataset. Domain-adversarial NNs (DANNs) 27 have been proposed to be an implementation of DA. Adversarial updating in deep NNs improves prediction accuracy. We used DANN to calibrate the algorithm that associates spectral data with blood glucose levels; the ability of DANN for unsupervised learning allows calibrations without a training dataset labeled with invasive blood samples. Figure 2 shows the process°ow for the preprocessing, training, and evaluation of regression results. We applied series cross-validation and multiple linear regression (MLR) models to select an appropriate mid-infrared radiation for noninvasive blood glucose measurements, and only wavenumbers that had a high correlation unconditionally were selected. 22 The selected wavenumbers were 1050, 1070, and 1100 cm À1 and to reduce disturbances, such as those caused by the contact pressure of the prism, a wavenumber of 1000 cm À1 was used, which has low absorption for normalization. Therefore, the wavenumbers used for regressions were 1050 cm À1 , 1070 cm À1 , and 1100 cm À1 , normalized at 1000 cm À1 for all data. Certain experimental conditions, especially the contact pressure and temperature, can a®ect spectroscopy measurements. Temperature changes are superimposed on the measured value as the blackbody radiation spectrum of the light source of the FTIR device, although this e®ect can be eliminated by subtracting the background from the FTIR measurement. The contact pressure of the ATR prism can also a®ect the results, but due to normalization at 1000 cm À1 , this e®ect is mostly canceled out. Because of the time taken by blood to pass from vessels to tissue°uids, 28 we used the measured value from the self-measuring devices delayed by 26 min. 22 Once the labeled and unlabeled datasets were subjected to this preprocessing, the labeled dataset was used as training data and each series of the unlabeled dataset was used as test data.

Process°ow
Using the tests for each series, we tested the accuracy of regression with all the data in the unlabeled dataset. When applying DA, the unlabeled dataset was also used as an unlabeled training dataset. Figure 3 illustrates the grouping of datasets. Di®erences in color and shape in the¯gure indicate di®erences in series. All series in the labeled dataset are labeled with blood glucose concentration, unlabeled data from one series in the unlabeled dataset are used for training, and the same series of the unlabeled dataset is used for testing. This process is repeated for each series in the unlabeled dataset, and the regression accuracy is calculated. In the training step, labels from the unlabeled dataset are not included. Therefore, even though the same series from the unlabeled dataset are given in both training and testing, the true value of blood glucose concentration is not provided during training. The number of data points was 131 for supervised training, the average number of data points per data series was 23 for unsupervised training, and 23 for the test. No validation data were used. The unlabeled dataset was used to predict the blood glucose levels obtained by the user of the glucose-monitoring device, and we assumed that the data had no labeled data of the blood glucose level. Therefore, although it was used for DA with no labeled data, it is not used for supervised learning with the labeled data. Figure 4 shows the con¯guration of the network used as a regressor of blood glucose levels. The network input is a set of absorbance values at 1050 cm À1 , 1070 cm À1 , and 1100 cm À1 . L x and Lc x are the layers of the network used for regression and classi¯cation, respectively, and w x and wc x label the weights in the corresponding layers. A leaky recti¯ed linear unit 29,30 with a gradient a i ¼ 0:2 in the negative region is used for the activation function. Batch normalization 31 is also used for each layer. Adam 32 is used for optimization. Figure 5 shows the training procedure used. First, in step 1, we train the network to regress the blood glucose level from the absorbance data in the labeled dataset. Then, in step 2, unlabeled absorbance data from one series of unlabeled dataset are added as input data and the network is trained to distinguish data in the labeled dataset from data in the unlabeled dataset. In step 3, the network weights for regression (w 1 , w 2 ) are updated before branching so that the labeled and unlabeled datasets are indistinguishable to the network. With this step, the blood glucose levels can be regressed at the output of L 3 by extracting the features that are common to the labeled and unlabeled datasets. Adversarial updating in steps 2 and 3 increases the regression accuracy by overlaying the distributions of labeled and unlabeled datasets in L 1 -L 3 layers. Therefore, by adjusting deviations in the distributions in a series from labeled and unlabeled datasets, the network can estimate the blood glucose level considering the labeled and unlabeled datasets. Training starts with step 1 using the supervised data of the labeled dataset and trains the weights of w 1 -w 4 in 18,000 epochs. After 18,000 epochs, all three steps including step 2 and step 3 were executed simultaneously for 8000 epochs, and this training process also used the unlabeled data in the unlabeled dataset. To balance the regression and DA processes, step 3 was only repeated in the  iteration in which the loss value of the regression in step 1 was < 320. The loss value in step 3 is multiplied by 350 to balance it with the loss values in step 1 and step 2. The training process runs for 26,000 epochs in total. All training data were randomized at each epoch. The loss functions used are the Euclidean loss for regression in step 1, and the softmax cross entropy for classi¯cation in step 2 and step 3. Note that an epoch in this report refers to a period in which all data are used for training once. The 18,000 and 8000 epochs were chosen to obtain the best correlation coe±cient with all the test data series (414 samples). It was necessary to search the parameters using except the test data series to be evaluated fundamentally. However, in this study, only one of the 18 series required removal from the parameter search, and since there was only a small number of parameters that needed to be optimized, the possibility that these would become over¯tted was relatively low. Thus, we applied this method due to limitations associated with the calculation time.

Results and Discussion
The average values and variances obtained for the three wavelengths of the two datasets are presented in Table 3. Individual di®erences and variations in the measurement environments (i.e., prism and FTIR device di®erences) could explain the various values observed. Figure 6 shows the change of loss values at each epoch of the training process. Each loss decreases as training progresses. A decrease in the loss value of step 1 indicates that the network is training regression for the labeled dataset. If the loss value of step 2 is decreasing, the network is training classiers for the labeled and unlabeled datasets. If the loss value of step 3 is decreasing, the training is progressing such that the distributions of labeled and unlabeled datasets overlap in the middle of the regression network. If these losses are reduced and well balanced, the network realizes DA. Figure 7 shows a comparison of data distributions with and without DA in a typical series from the unlabeled dataset. Figure 7(a) shows the data input to the L 1 layer, and Fig. 7(b) shows the output results from the L 3 layer. The three-dimensional spectrum values of datasets are reduced and plotted in two dimensions using principal component analysis. The colors and shapes in the graph indicate whether the data points are from the labeled or unlabeled datasets. At the input stage, the distributions of labeled and unlabeled datasets are  clearly shifted. In the output from the L 3 layer, however, the distributions are clearly superimposed, which show that this network adjusts for the differences of subjects between the labeled and unlabeled datasets.
Next, Fig. 8 shows a comparison of prediction accuracy with and without DA application, using the Clark error grid, 33 which is the commonly used scatter diagram for evaluating the accuracy of glucose-measuring devices. Figure 8(a) shows the Clark error grid for the unlabeled dataset in the prediction model trained from the labeled dataset series with only step 1 executed so that DA was not applied. For the result in which DA was applied, Fig. 8(b) shows the Clark error grid for the unlabeled dataset in the prediction model trained by executing all three steps.
For the prediction model trained without DA, the correlation coe±cient was 0.38, and 53.6% of predictions fall into region A in the Clark error grid. For the prediction model trained with DA, the correlation coe±cient was 0.47, and 63.8% of predictions fall in region A in the Clark error grid. This indicates that DA improves the correlation coe±cient and prediction accuracy. This result also shows that calibration of the prediction model without blood samples is possible with DA. Note that the test data were measured under various conditions such as recent meals, subjects, and the experimental setups. Therefore, this correlation result also shows that the model has unconditional regression performance. Next, we compare the results obtained by comparing the machine-learning models. Table 4 shows the comparison of the correlation coe±cient in various models and the percentage of data points in region A of the Clark error grid. It also shows the root mean square error and the mean absolute di®erence for the obtained data. The results for MLR and PLS are reproduced from previous studies 22 in which the data were same as for this study. Table 4 also lists results from an NN without DA and the present test with a DANN using DA. All methods have the same conditions, wherein calibration with blood sampling was not performed. For methods other than DANN, calibration was not performed for each series of the unlabeled dataset. PLS also has a function for wavenumber selection, so its inputs included widespectrum absorbance values (every 2 cm À1 from 980 to 1200 cm À1 ). The other models' input wavenumbers were 1050, 1070, and 1100 cm À1 .
The result shows that PLS, which is generally used for spectrum analysis, does not yield acceptable results. We suspect that this poor performance can be explained as an e®ect of over¯tting because the wavenumber of the input spectrum was larger than the number of data. NN models can handle nonlinear components, which we assume that this explains this approach's slightly better accuracy over MLR. DANN shows the best results of the methods we tested.
The prediction accuracy of the method was also evaluated using a data series for a subject in the situation of applying DA with another series of data for the same person. However, the results varied signi¯cantly depending on which data series was applied for the DA. For example, when the DA using di®erent data was applied for the same person, good results were not always obtained, which may be attributed to the di®erences in the dataacquisition environment and the meal content, even for the same subject. However, it was possible to apply DA each time a data series was acquired when considering the actual use. Therefore, the user of the glucose-monitoring device could improve the prediction accuracy for the acquired data series by applying the DA method.
We also evaluated the prediction accuracy of the method using series cross-validation in case all the data from the labeled and unlabeled datasets were used for supervised training. The obtained correlation coe±cient was estimated to be 0.35, which was lower than the NN obtained with only the labeled dataset. This is because although measurement conditions of labeled dataset were stable, measurement conditions of the unlabeled dataset varied widely, so the NN could not learn a suitable regression model for blood glucose levels. For this reason, it is better to learn the regression model using data measured under stable conditions and applying DA by DANN for each series.
There are a few potential reasons for the shifting or distortion of the measurement series are as follows: First, there are individual di®erences in the metabolic systems, and since this method does not directly observe glucose, it is a®ected by metabolism. Second, di®erent meal content has di®erent e®ects on metabolism. Third, there are individual structural di®erences in skin-depth direction. 34 Fourth, there are structural di®erences in the horizontal direction, which are also related to the position of the ATR probe. 35 For di®erences related to optical issues (refer to the third and fourth reasons), calibration can be performed by selecting of optimal area with respect to depth direction and horizontal direction. On the contrary, our approach is based on a machine-learning technique. By combining the DA technique with the optical approach, it is possible to calibrate not only optical di®erences but also metabolism-based di®erences, which is intended to serve as a foundation for further improvements of the method. Using the proposed DA method, blood sampling at the time of calibration is rendered unnecessary. This removes the major barrier in using noninvasive glucose-monitoring technology. Without requiring invasive calibration, prediction accuracy can be improved at the time of actual use, which could make noninvasive glucose-monitoring systems available for home use. Additionally, this method can be applied more generally to other medical measurements that usually need to be calibrated for use with each individual using invasively obtained samples. For example, the technique can be applied for noninvasive measurements of other blood components.

Conclusion
We developed a method for calibrating mid-infrared spectral blood glucose data without training data derived from blood samples. We showed that a DANN performing DA can be applied to unsupervised calibration with unlabeled spectral data. The training process was successful and improved the correlation coe±cient and prediction error over comparable methods. Using the method, blood sampling at the time of calibration becomes unnecessary and the accuracy of prediction of noninvasive glucose-monitoring systems for home use can be improved. In the future, we plan to test the accuracy of this method with data collected from a device that uses a laser light source, which would be a more practical approach for patient use.

Con°ict of Interest
The authors declare that there are no con°icts of interest related to this article.