Algorithm to identify circulating tumor cell clusters using in vivo ° ow cytometer

Kai Pang*, Dan Wei*, Pengfei Hai, Zhangru Yang, Xiaofu Weng* and Xunbin Wei* *State Key Laboratory of Oncogenes and Related Genes Shanghai Cancer Institute Med-X Research Institute and School of Biomedical Engineering Shanghai Jiao Tong University 1954 Huashan Road, Shanghai 200030, P. R. China Department of Biomedical Engineering Washington University in St. Louis One Brookings Drive, St. Louis, Missouri 63130, USA Radiation Oncology Center Fudan University Shanghai Cancer Center (FUSCC) Shanghai 200032, P. R. China xwei01@sjtu.edu.cn


Introduction
Circulating tumor cells (CTCs) have been considered an important biomarker for tumor diagnosis and prognosis. 1,24][5] These results indicated that circulating tumor cell clusters might survive and metastasize more easily than individual circulating tumor cells.The ability to detect circulating tumor cell clusters is necessary for studies of tumor metastasis.7][8][9] The circulating tumor cell clusters may not be present in blood samples due to limited blood sample volumes.In addition, blood sample treatments may alter the properties of circulating cell clusters. 10,11For example, the addition of an anticoagulant may disperse the cell clusters, which may be a reason that little attention has previously been paid to circulating cell clusters.
The in vivo °ow cytometer (IVFC) has been used to detect and enumerate CTCs in many reported studies.  Withonger detection times in vivo, a larger volume of blood is examined, and CTCs are more likely to be detected with IVFC compared to conventional in vitro detection methods.To accurately identify these cell clusters, we propose a new signal processing criterion and algorithm.Signals with larger peak widths, as detected by IVFC, have been used as a criterion to identify cell clusters in previous studies. 3However, the accuracy of this criterion might be greatly degraded by changes in blood °ow and the rolling behaviors of circulating tumor cells.
Here, we propose a criterion and algorithm to distinguish cell clusters from single cells, based on computer simulation and intravital imaging.In this work, we established two models of °uorescent molecule distribution for two kinds of °uorescently labeled cells.To better analyze the signals from cell clusters, we ¯rst used area-based and volume-based models for single °uorescent cells.Using simulations of each model, we analyzed the morphology of IVFC signals from single cells and cell clusters.
According to the Rayleigh criterion, the valley between two adjacent peak signals, from two distinguishable cells, should be lower than 73.5% of the peak values, and a novel signal processing algorithm for IVFC was developed based on this criterion.Our results showed that cell clusters could be reliably identi¯ed using our proposed algorithm.The results from intravital imaging also supported the use of our method.

Cell preparation
The 4T1 breast cancer cells were transfected with green °uorescent protein (GFP).They were cultured and expanded in an incubator, which maintained a temperature of 37 C and 5% CO 2 .The cells were used after the sixth passage with stable GFP expression and then the cells were prepared for the simulation using the volume-based cell model.

Animal preparation
Balb/c mice (20 AE 2 g, 5 week-old) were purchased from Shanghai SLAC Laboratory Animal Co. Ltd.The animal treatment procedures were approved and monitored by the Ethical Committee of Animal Experiments in the School of Biomedical Engineering, Shanghai Jiao Tong University.The mice were anesthetized with 1% pentobarbital sodium salt (0.01 mL/g mouse weight) during the experiment.To remove hair on the mouse ear, depilatory cream (sensitive hair removal cream; Veet) was topically applied.

IVFC
To simulate the proposed cell model, IVFC was used to collect signals from GFP-labeled cells.The schematic of IVFC was shown in supplementary Fig. S1.Brie°y, the mouse was anesthetized and placed on the sample stage.With a 535 nm LED as the light source, live images of the mouse's ear were captured.These images were used to guide navigation of the laser beam onto a blood vessel for detection.Typically, 50-70 m-wide blood vessels were selected as the detection site.The 488 nm laser beams were modulated into slit-shaped beams using cylindrical lens.The size of the laser slit was approximately 5 Â 72 m at the focal plane.Under the guidance of the ear imaging by CCD camera, the laser slit was placed across the selected blood vessel.Fluorescently labeled cells were introduced into the blood by tail vein injection.When a cell passed through the laser beam, °uorescence was monitored by the detection system.After photo-electrical conversion via a photomultiplier tube (PMT), the signals were digitized (100 KHz) and recorded on a computer.

Computer simulation and data processing
Computer simulations were performed on the MATLAB (2015a, Mathworks) computing platform.The plots were generated according to the area-based cell model or volume-based cell model.Data processing was performed according to the criterion and algorithm proposed using homemade MATLAB scripts.

Intravital imaging
Intravital imaging was performed with a Leica DM5500 system with a 10Â objective (water immersion, NA ¼ 0:6).The frame rate was 30 fps.

Area-based model of single cell
To better analyze a situation where multi-cells pass through the laser slit, models for single cells were ¯rst constructed.Two di®erent models for single cells corresponded to di®erent °uorescent labeling methods and °uorescent dyes.
The area-based single cell model was studied ¯rst.This model could be applied to °uorescent dyes that attached surface membrane of the cells, such as DiD.To simplify the simulation model, the laser power within the laser slit beam was considered uniform.The emitted °uorescence was proportional to the area that was irradiated by the laser beam.
In this model of peak signals, key parameters included the cell radius (R), width of the excitation laser (W ), °ow speed of the cells (v) and time (t).When the cell size was larger than the width of laser beam, the relationship between the parameters was as follows: R ! 1 2 W. The process of the cell passing through the laser beam was divided into three phases (Fig. 1(a)).The relationship between °uorescence (F ) and t during the whole procedure is plotted in Fig. 1(b).
For the ¯rst phase, the front of the cell reached the front of the excitation laser beam, but the front of the cell had not reached the back of the excitation laser beam.This section was described as follows: 0 vt W .The °uorescence (F ) detected within this time was described as follows: k is the ratio between the detected °uorescence and excited area, and S is the laser irradiated area.During this period, S could be described as 2Rvt.
Based on these equations, the °uorescence intensity increased proportionally within this time as For the second phase, the front of the cell passed the back of the excitation laser beam, but the back of the cell had not reached the front end of the excitation laser beam.This section was described as follows: W vt 2R.In the second section, the excitation area in the cell equaled half the di®erence between the two ball crowns intercepted by the entire excitation laser beam.The °uorescence (F ) within this time was described as The °uorescence intensity was consistent during the second section.
For the third phase, the back of the cell passed the front of the excitation laser beam, but the back of the cell had not reached the back of the excitation laser beam.This section was described as 2R vt 2R þ W .The excitation area of the cell equaled Rð2R þ W À vtÞ.The °uorescence (F ) over time was described as follows: From the equation above, the °uorescence intensity decreased proportionally during this section.Figures 1(c) and 1(d) depicts the F -t relationship at di®erent °ow velocities (v) and radii of cells (R), respectively.

Volume-based model of single cell
The volume-based peak model indicated that the intensity of the emitted °uorescence was proportional to volume of cells excited by the laser.In this model, °uorescence peaks corresponded to cells expressing °uorescence proteins in the cytosol, such as GFP.
In this model, key parameters included the cell radius (R), width of the excitation laser (l), °ow speed of the cells (v) and time (t).When cells were larger than the excitation laser area, the relationship between the parameters was as follows: R ! 1 2 W . Based on the relative positions of the cell and excitation laser beam, the cell passed through the excitation laser in three phases (Fig. 2(a)).The relationship between F and t during this process is plotted in Fig. 2(b).
For the ¯rst phase, the front of the cell reached the front of the excitation laser beam, but the front of the cell had not reached the back end of the excitation laser beam: 0 vt W . Here, the excitation volume of the cell is the volume of the ball crown that is intercepted by the front end of the excitation laser beam.The °uorescence (F ) during this time is Here, k is the ratio between the °uorescence intensity and excited volume, and V is the volume irradiated by the laser.Finally, F could be described by the following equation: Based on the above equation, the °uorescence intensity increased gradually during the ¯rst phase.
For the second phase, the front of the cell had passed the back of the excitation laser beam, but the back of the cell has not reached the front of the excitation laser beam.This section is described as follows: W vt 2R.Here, the excitation volume in the cell equals the di®erence between the volumes of two ball crowns intercepted by the two ends of the excitation laser beam.After integrating, the °uorescence (F ) within the time was as follows: Based on this equation, the °uorescence intensity ¯rst increased to the local maximum and then began to fall.
For the third phase, the back of the cell has passed the front of the excitation laser beam, but the back of the cell has not reached the back of the excitation laser.This is described as follows: v, t and l:2R vt 2R þ W .In the third phase, the excitation volume in the cell equals the volume of the ball crown intercepted by the back of the excitation laser beam.After integrating, the °uorescence (F ) within this time was as follows: The °uorescence intensity decreased over time during the third phase.Figures 2(c) and 2(d) depict the F-t relationship at di®erent °ow velocities (v) and radii (R) of cells, respectively.

Models of dual-cell cluster
To better analyze the complex cases when clusters with multiple cells pass through the excitation laser beam, models of dual-cell clusters were designed, based on the single cell models above.For the dualcell models, additional parameters were added, including the radii of the two cells (R1 and R2) and the distance between the front ends of the two cells at the horizontal axis (d).The horizontal axis corresponded to blood °ow in the vessels.The distances are all referred to base on this horizontal axis.If the distance between the two cells was larger than the sum of both radii and the slit, the resulting peaks would not overlap in the time domain.This condition was the same as that of two separate single peaks.The relative sizes and positions of the two cells is the key concept here.
For the ¯rst cell, the process was similar to that in the single cell model.For the second cell, the starting time changed to t þ d v , where d is the distance between the fronts of the two cells and v is the velocity of the °owing cells.We simulated the area-based dual-cell model, and then we studied the characteristics of the dualcell model that corresponded to di®erent distances between the two cells, as depicted in Figs.3(a) and 3(c).To distinguish two cells in the °uorescent signals, a local minimum should exist between two adjacent local maxima, and the valley should be lower. 33

Rayleigh criterion and algorithm to identify cell clusters
The resolution of in vivo °ow cytometry was based on optical imaging.To distinguish two adjacent points, there should be some di®erences in the brightness of an overlapping area between the two di®raction spots.According to the Rayleigh criterion, [34][35][36] the minimum value of brightness in the overlapping area should be 73.5% of the maximum value.
The algorithm to identify cell clusters was proposed based on the criterion.Brie°y, the raw °uorescence data were denoised using a wavelet ¯lter (Sym6, level 3).Then, the smoothed data trace was scanned to identify all local maxima, and a threshold was calculated according to the following equation: where median(data) represents the median value of the data; mad(data) indicated the absolute deviation from the mean of the data; and MadFactor was a multiplier factor to control the rigidness of the threshold, usually selected as 6. 37,38 The adjacent local maxima were clustered in a group if no local minima fell below the threshold.E®ective signals from cells occurred when the local minimum between two adjacent local maxima was below 73.5% of the lower maximum.
As shown in Fig. 4(a), the e®ective signal for a cell was indicated by a circle.This group of signals contained only one single cell.In Fig. 4(b), there were two local maxima in this signal °ock.However, the local minimum between these two local maxima was higher than 73.5% of the lower local maxima, indicated by the square.The lower peak was not regarded as an e®ective signal for the cell.Thus, one single cell was identi¯ed in this group of signals.
A dual-cell cluster was identi¯ed, as shown in Fig. 4(c).The local minimum satis¯ed the Rayleigh criterion.A cell luster with four cells was identi¯ed in a signal group, as depicted in Fig. 4(d).This suggested that the proposed criterion and algorithm were e®ective in identifying cell cluster signals.largest pro¯le value indicated by a black circle.The normalized valley pro¯le value between the two peak pro¯le values was below 0.735.This indicated that the group of signals represented a dual-cell cluster, complying with the Rayleigh criterion.An image of the cell cluster in the succeeding frame was shown in Fig. 5(c).Though intravital imaging was somewhat limited by motion blur, similar results were also observed.As shown in Fig. 5(d), the °uorescence pro¯le of the cell cluster also complied with our proposed criterion.

Discussion
In this work, we established two cell models for two di®erent °uorescently labeled cells.The area-based model corresponds to cells with °uorescent molecules distributed on the surface of cell membrane, like DiD.The volume-based model can be applied to cells producing a °uorescent protein, such as GFP.
Based on these two cell models, we analyzed dualcell cluster signals, which represent the simplest type of cell clusters.Using the Rayleigh criterion, we developed a new algorithm to identify cell cluster signals.The e®ectiveness of the algorithm was veri¯ed with experiments.A potential limitation of IVFC is that the collected signals provide no spatial resolution in the direction perpendicular to the blood °ow.Two cells could pass through the laser slit side by side.The signals generated by these two patterns are barely distinguishable from single cell signals.In these situations, some cell clusters will be missed by IVFC.To obtain the spatial resolution, two possible solutions have been proposed and tested.One solution is to perform a line scan during data collection.The laser beam could be reshaped into a circle, and then the size of the laser beam could be reduced to generate a higher spatial resolution.A scanner could be used to move the laser spot along the direction perpendicular to the blood °ow.The other solution is intravital imaging.By capturing images of °owing cells, single CTCs and cell clusters can be distinguished based on distinct morphological information.However, this solution also su®ers from a limited frame rate.

Con°ict of Interest
All authors declare no potential con°icts of interest.

Fig. 1 .
Fig. 1.Area-based single cell model.(a) Phases of cell passing the laser beam in area-based model.(b) Typical area-based model of single cell.In this model, R ¼ 10 m, W ¼ 5 m, v ¼ 5 mm/s and k ¼ 1 mV/m 2 .(c) Area-based model of single cell with di®erent velocities.In this model, R ¼ 10 m, W ¼ 5 m and k ¼ 1 mV/m 2 .(d) Area-based model of single cell with di®erent radiuses.In this model, W ¼ 5 m, v ¼ 5 mm/s and k ¼ 1 mV/m 2 .A.U. denotes arbitrary unit.

Fig. 2 .
Fig. 2. Volume-based single cell model.(a) Phases of cell passing the laser beam in volume-based model.(b) Typical volume-based model of single cell.In this model, R ¼ 10 m, W ¼ 5 m, v ¼ 5 mm/s and k ¼ 1 mV/m 2 .(c) Volume-based model of single cell with di®erent velocities.In this model, R ¼ 10 m, W ¼ 5 m and k ¼ 1 mV/m 2 .(d) Volume-based model of a single cell with di®erent radii of cells.In this peak model, W ¼ 5 m, v ¼ 5 mm/s and k ¼ 1 mV/m 2 .A.U. denotes arbitrary unit.
Similar results could be observed using the volume-based dual-cell model, as shown in Figs.3(b) and 3(d).

Fig. 3 .
Fig. 3. Area-based and volume-based dual-cell models.(a) Phases of dual cells passing the beam in area-based model.(b) Phases of dual cells passing the laser beam in volume-based model.(c) Area-based dual-cell models with di®erent cell distances.In this peak model, v ¼ 5 mm/s, k ¼ 1 mV/m 2 , R1 ¼ R2 ¼ 10 m and W ¼ 5 m.(d) Volume-based dual-cell model with di®erent cell distances.In this model, V ¼ 5 mm/s, k ¼ 1 mV/m 2 , R1 ¼ R2 ¼ 10 m and W ¼ 5 m.A.U. denotes arbitrary unit.

Fig. 4 .
Fig. 4. Cell cluster identi¯cation using proposed algorithm based on Rayleigh criterion.(a) Typical data trace for single circulating tumor cell.Circle indicates position of local maximum.(b) Single cell signal identi¯ed by proposed algorithm.Square indicates local maximum, which is not considered an e®ective signal for cell.(c) Typical signal of cell cluster with dual cells.(d) Another typical signal of CTC cluster identi¯ed with proposed algorithm.This group of peaks indicates that this cell cluster consists of four cells.A.U. denotes arbitrary unit.

Fig. 5 .
Fig.5(a).Due to light absorption by hemoglobin, blood vessels appear dark in the image.The narrower vessel on the left is an artery, while the wider vessel on the right is a vein.A cell cluster consisting of two cells was observed in the vein.The part of the image within the dashed box was extracted to analyze the °uorescence pro¯le.The °uorescence pro¯le was calculated by integrating the °uorescence intensity within the black dotted box, which we then moved along the horizontal axis.The width of the box was $ 5 m, which was comparable to the laser slit in IVFC.Sliding the box mimicked a cell °owing through the laser slit.After getting the whole pro¯le, it was normalized by the second