Near-infrared chemical imaging for quantitative analysis of chlorpheniramine maleate and distribution homogeneity assessment in pharmaceutical formulations

Near infrared chemical imaging (NIR-CI) combines conventional near infrared (NIR) spectroscopy with chemical imaging, thus provides spectral and spatial information simultaneously. It could be utilized to visualize the spatial distribution of the ingredients in a sample. The data acquired using NIR-CI instrument are hyperspectral data cube (hypercube) containing thousands of spectra. Chemometric methodologies are necessary to transform spectral information into chemical information. Partial least squares (PLS) method was performed to extract chemical information of chlorpheniramine maleate in pharmaceutical formulations. A series of samples which consisted of different CPM concentrations (w/w) were compressed and hypercube data were measured. The spectra extracted from the hypercube were used to establish the PLS model of CPM. The results of the model were Rval2 0.981, RMSEC 0.384%, RMSECV 0.483%, RMSEP 0.631%, indicating that this model was reliable.


Introduction
Near infrared chemical imaging (NIR-CI) is an emerging technique which combines conventional near infrared (NIR) spectroscopy with chemical imaging to provide spectral and spatial information simultaneously. 1Traditional single point NIR spectroscopy obtains a bulk average spectrum to re°ect an average composition of the sample.NIR-CI adds a spatial dimension to NIR spectroscopy, which gifts NIR-CI ability to acquire distributional information of ingredients in the sample.
In NIR-CI, the spectrum of the sample is recorded and ¯nally all spectra comprise a hyperspectral data cube. 2 Once the spectral signature of each pixel was transferred into chemical information (i.e., the concentration), the chemical images will be generated, which re°ect the distribution of components in the sample.NIR-CI also owns the superiorities of being rapid, nondestructive and without sample pretreatment.Therefore, NIR-CI has the potential to acquire increasing process and product understanding, which is consistent with the process analytical technology (PAT) 3 initiative encouraged by Food and Drug Administration (FDA) in pharmaceutical ¯eld.
The ability of simultaneously obtaining spectral and spatial information of components has made NIR-CI a promising PAT tool for the control of pharmaceutical manufacturing process and quality assessment of ¯nal products.The applications of NIR-CI include the assessment of homogeneity of mixture during the blending process, [4][5][6][7] the visualization of spatial distribution of components in intermediate and ¯nal product, 8,9 the discrimination of counterfeit pharmaceutical products, [10][11][12] etc.More pharmaceutical applications of NIR-CI technique can refer to previous reviews. 13,14Among the applications, NIR-CI is especially suitable to assess distribution homogeneity of component in the sample, as spatial information could be acquired by NIR-CI.
However, the data measured by NIR-CI are three-dimensional hypercube.All the ingredients spectral signatures are overlapping and chemometric methods are required to obtain relevant qualitative or quantitative information.The hypercube could be analyzed by either three-way method or the two-way method.It was proved that the two-way methods were more suitable for this type of data. 15The hypercube has to be unfolded to two-dimensional matrix and then conventional twoway methodologies could be used.The most commonly used methods include the univariate methods (i.e., the characteristic wavenumber method) and multivariate methods, such as partial least squares (PLS), classical least squares and so on.Finally, the two-dimensional matrix is refolded to retain the spatial distribution of each pixel and reconstruct the chemical images.
Chlorpheniramine maleate (CPM) is a H1 receptor antagonist and has a strong action of anti-histamine. 16It has been clinically used to alleviate symptoms of cold and treat the allergic disease.According to the Chinese pharmacopoeia (2010 Edition, Volume II), legitimate labeled content of CPM in CPM tablets is 1 mg or 4 mg.The CPM content is about 1-5% (w/w) in a tablet.Active pharmaceutical ingredient (API) distribution plays an important role in both medicine safety and e±cacy, especially for small dose products or sustained and controlled release products.Therefore, it is necessary to assess the distributional homogeneity of CPM in the tablet.In this study, the CPM tablets made by ourselves were taken as examples.NIR-CI was used to acquire the concentration information of CPM coupled with PLS method.Then, the concentration value reconstruction images were generated for further distributional homogeneity assessment.

Materials
A four-ingredient pharmaceutical tablet formulation was used to produce the NIR-CI data set.The API of tablet was CPM and provided by Haohua Industry Corporation (Jinan, P. R. China).The main excipients of the tablet were pregelatinized starch (STA) and microcrystalline cellulose (MCC), which were purchased from Colorcon (USA) and Beijing FengliJingqiu Commerce and Trade Corporation (P.R. China), respectively.Magnesium stearate (MgS) was purchased from Sinopharm Chemical Reagent Corporation (P.R. China) and served as lubricant.

Sample preparation
The calibration data set comprised of 33 batches was designed by a D-optimal formulation design using Design Expert 7.0 software (USA).The content range of CPM was from 1% to 10%, and 20% to 90% for pregelatinized starch and from 10% to 80% for MCC.The content of magnesium stearate was ¯xed because a little amount was added extra in the formulation (0.4%, w/w).Table 1 showed the contents of the four ingredients in each of the 33 calibration batches.
The mixing of dry-blend formulation was performed in a blender using the equal incremental method and compressed into tablets of 0.5 g by direct compression on a rotary tablet press (Xinyuan Pharmaceutical Machinery Corporation, Shanghai, P. R. China).The parameters of rotary tablet press were set as compression pressure 60 KN, the depth of ¯lling material 5.0 mm and the thickness of the tablet 2.0 mm.A °at punch was utilized to obtain a °at sample surface.Besides, three batches of prediction set were produced in the same way to test the performance of the PLS model.The ingredients contents of prediction set were also demonstrated in Table 1.The CPM contents in the prediction set were 4.5%, 5.5% and 6.5%, which were within the content range of the calibration set in PLS model.

Hyperspectral data acquisition
One tablet from start, middle and the end of the tableting process of each batch was selected in order to assure a representative sampling for calibration set.Therefore, a total of 99 samples were imaged (3 tablets from each of the 33 calibration batches).For the three batches in prediction set, one tablet of each batch was compressed and imaged.
Each sample was ¯xed onto a microscope slide and detected directly on the tablet surface.A NIR lining mapping instrumentation (Spotlight 400N FT-NIR Imaging Systems, PerkinElmer, UK) was applied to analyze the samples.A linear mercury cadmium telluride (MCT) array detector enables 16 spectra being collected in one measurement.An area (1000 m Â 1000 m for calibration set and 2000 m Â 2000 m for prediction set) was imaged using pixel size 25 m Â 25 m and spectrum resolution 16 cm À1 thus acquiring a total of 1600 spectra for calibration set and 6400 for prediction set for each image.Each spectrum was the average of 16 scans and the wavenumber region was from 7800 to 4000 cm À1 .
A high re°ectance standard Spectralon TM (Labsphere, Inc., North Sutton, New Hampshire) was used as a background to correct the instrument response.Hence, relative NIR di®use re°ectance data (R ¼ R sample =R background Þ could be obtained and transferred into absorbance data ðA ¼ Àlgð1=RÞÞ for further analysis.

Data processing
The NIR-CI data is a hyperspectral data cube ðM ¼ X Â Y Â , X and Y represent spatial dimensions and is spectral dimension).Commonly, the three-dimensional matrix would be unfolded into two-dimensional matrix ðXY Â Þ and then two-way methodologies could be performed.Though multivariate approaches can analyze the spectral data using the entire measured wavenumber range, proper variable selection could improve the precision of a multivariate method in some cases.SiPLS was utilized to select the optimal wavenumber ranges for PLS models.
Besides, proper preprocessing methods were used to avoid the impact of nonchemical information from the image.In this study, some most commonly used approaches were used, such as Savitzkky-Golay (SG) derivative transformation, multiplicative scatter correction (MSC) and standard normal variate (SNV). 17

PLS modeling and image reconstruction
PLS is a multivariate regression methodology performed to construct quantitative calibration model.This method is based on the relation between the spectral signals ðXÞ and the reference values (Y ). 18he spectral data would be corrected to a property matrix of maintaining the information of interest while removing interference signal of other spectral factors.
In this study, PLS model of the active pharmaceutical ingredient (CPM) was established.A total of 99 calibration samples were imaged and the mean spectrum of each sample was computed.Then, the three mean spectra of each batch were averaged.The 33 mean spectra of calibration batch comprised the matrix X.The theoretical content (%, w/w) formed the matrix Y for each component.The sample set was divided by Kennard-Stone (KS) algorithm into calibration set (22 samples) and validation set (11 samples).PLS model was constructed and optimized according to the number of latent factors, preprocessing methods and wavenumber range calculated from the regression model.Leave-one-out cross validation was utilized as cross-validation approach.
After the same pretreatments, the hypercube data of the prediction set was applied to the built PLS model.Several parameters are used to assess the predictive capability of PLS model, such as the determination coe±cient (R 2 ), root mean square error of calibration (RMSEC), root mean square error of cross validation (RMSECV) and root mean square error of prediction (RMSEP).
Then, the spectral information of each pixel would be converted into predicted concentration information.The concentration image of CPM was generated through the reconstruction of predicted concentration matrix retaining the spatial location of each pixel.The mean concentration of CPM was calculated by averaging all predicted pixel concentrations.
Hyper View software and Spectrum Image software (PerkinElmer, UK) were used for data processing and analysis.Other data analysis was performed by home-made routines programmed in MATLAB software (MATLAB2009b, Mathworks, USA).

Assessment of distributional homogeneity of CPM
After NIR-CI measurement and data analysis, the spatial distribution images of components could be obtained.Through observing the distribution images by eyes, the homogeneity of ingredient could be preliminarily assessed.However, this method is not objective and it is di±cult to quantitatively assess the homogeneity of di®erent samples.A criterion called \distributional homogeneity index (DHI)" has been proposed to assess the distributional homogeneity of chemical image.This method was based on continuous-level moving block (CLMB) methodology and calculating the ratio of areas under the real and random homogeneity curve of chemical images.The distribution is more homogeneous, the value of DHI is closer to 1. Through the calculation of DHI value, the distribution homogeneity of di®erent samples could be objectively assessed.More detail descriptions of the DHI theory could be found in Ref. 19.

Data preprocessing
The three-dimensional data obtained from NIR-CI equipment was unfolded into two-dimensional matrix.The spectra would be a®ected by the overlapping peaks, spectral noise or baseline drift, etc. Preprocessing approaches were performed to improve the accuracy of the model performance.
Several types of preprocessing approaches were utilized in the spectral dataset, such as origin spectra, SG smoothing with 9-point window, SG smoothing with 11-point window, 11-point SG and ¯rst derivative (SGþ1D), 11-point SG and second derivative (SGþ2D), SNV, MSC and normalize.Leave-one-out cross validation was used to select the optimal preprocessing method and the number of latent variable factors was investigated.
The optimum number of latent variable factors was obtained by calculating the lowest predicted residual sum of squares (PRESS) value, because the minimal PRESS value indicates a good balance between the robustness of the model and R 2 value.Finally, a plot of latent factors against press value was generated by the model, as seen in Fig. 1.The result of PLS models with di®erent pretreatment methods were demonstrated in the Table 2.The spectra preprocessed by 11-point SG and second derivative methodology had the lowest latent factors, RMSECV value and coe±cient of determination (R 2 ) value closest to 1, which was proved to be the best preprocessing method for PLS model.

Variable selection by SiPLS model
Moreover, SiPLS was utilized as a variable selection method.The dataset of full spectrum was separated into several intervals.Several intervals were used to build a joint model and the RMSECV value was regarded as the measure of the accuracy of models.The combination of intervals with the lowest RMSECV was chosen.Using this method, the spectral regions that have poor information about the property in study are eliminated while that important bands are retained, thus decreasing the vulnerability of the calibration models. 20With variable selection for characteristic spectral regions, the performance of previous PLS model may be improved in this way.
In this paper, the SiPLS model was constructed with combination of subinterval number 3 using 10 equidistant subintervals and 2 factors.The RMSECV, RMSEP and RMSEC were 0.574%, 0.648%, 0.416%, respectively, which indicated low performance of SiPLS model.It was because the full spectrum contained more information needed in this situation.Therefore, the calibration set of full spectrum was used as matrix X to build PLS model directly.

Establishment of PLS model
Based on the above analysis, the spectra preprocessed by 11-point SG and 2 derivate were used to build PLS model with 2 latent factors.The RMSECV, RMSEP and RMSEC were 0.483%, 0.631% and 0.384%, respectively.The R 2 were all higher than 0.9, indicating a good accuracy of PLS

Quanti¯cation and chemical image reconstruction
One tablet sample for each batch in prediction set was measured and analyzed with the method as described in Secs.2.5 and 3.1, respectively.A total of 6400 spectra were acquired for each sample ðð2000 Â 2000Þ=ð25 Â 5Þ ¼ 6400, an area of 2000 m Â 2000 m, spatial resolution 25 mÂ 25 mÞ.Each spectrum was applied to the built PLS model and concentration of each pixel could be predicted.Figure 3 showed the concentration images of the three samples in prediction set.The images were reconstructed based on the predicted concentration of each pixel according to the origin spatial location.The color band from blue to red represented the range of concentration from low to high.Thus, the spatial distribution of API can be illustrated and visualized.The mean concentration of each component was calculated by averaging all predicted concentrations of pixels.The mean concentration of each sample was 4.09%, 5.95% and 5.85%, respectively.

Assessment of distributional homogeneity of CPM
As shown in Fig. 3, CPM distribution of sample 1 was dispersive and homogeneous by eyes.Nevertheless, there were some large red areas could be observed on the surface of sample 2 and sample 3, which indicated the existence of aggregation and agglomeration phenomenon of CPM.Through visual observation, CPM distribution of sample 1 was considered to be the most homogeneous while CPM distribution of sample 2 might be the most inhomogeneous.However, the assessment of CPM distribution might be subjective only by naked eyes.Hence, DHI method was performed to further assess the CPM distributional homogeneity of di®erent samples.The size of concentration value reconstruction images of CPM was 80 pixel Â 80 pixel.Therefore, the image was sampled by the di®erent sizes of macropixels from 2 pixel Â 2 pixel to 80 pixel Â 80 pixel.
First, the standard deviation (Std) of each size of macropixel of real concentration reconstruction image was computed.Second, the real concentration reconstruction image was randomized to generate its corresponding random image.Similarly, the random image was sampled by di®erent sizes of macropixels and the Std value of each size of macropixel was calculated.Then, the homogeneity curve of real distribution image and random distribution image was drawn through plotting Std value against macropixel size (Fig. 4).Ultimately, the area under the homogeneity curve (real and random) was calculated and the value of DHI was obtained.
The DHI value of each sample was 1.394, 4.300 and 2.635, respectively.The tablet becomes more homogeneous as the DHI value increases.Therefore, the CPM distributional homogeneity of sample 1 was the most ideal while the distribution of sample 2 was the most inhomogeneous.

Conclusions
NIR-CI can acquire spatial distribution information of components in a sample besides providing spectral information.Through unfolding the hypercube into two-dimensional matrix, conventional two-way chemometric approaches could be used to extract interested chemical image.In this study, PLS method was performed to acquire concentration information of CPM tablets.A total of 33 mean spectra and the theoretical CPM contents of each batch were used as matrix X and Y to establish the PLS model.The data matrix of prediction set was applied to the PLS model and the spectral information was transferred into concentration information.The CPM concentration value reconstruction images were generated through refolding the two-dimensional matrix according to the origin spatial location of each pixel.A criterion called DHI was performed to assess the CPM distributional homogeneity of di®erent samples.The result indicated that the sequence of CPM distributional homogeneity of 3 samples was: sample 1, sample 3, sample 2.
NIR-CI has showed great potential in pharmaceutical industry as an emerging PAT tool.With the help of chemometrics, the measured spectral information could be transformed into chemical information.Moreover, the spatial information could be also obtained.The ability of providing distributional information makes it especially suitable to assess homogeneity of components in the sample.More attention should be paid to NIR-CI and more e®orts should be made to promote its development.

Table 1 .
The content of components of each batch (w/w).

Table 2 .
Di®erent pretreatment methods of PLS model.Figure 2 demonstrated the calibration and validation regressions for PLS model, the reference value and prediction value almost distributed in a straight line closely.The parameters of PLS model for API indicated that this model was reliable.