Optimized splitting of mixed-species RNA sequencing data
Abstract
Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.
References
- 1. , The mouse ascending: perspectives for human-disease models, Nat Cell Biol 9 :993–999, 2007. Crossref, Medline, Google Scholar
- 2. ,
Building a better mouse: One hundred years of genetics and biology , in The Mouse in Biomedical Research, Elsevier, Amsterdam, The Netherlands, pp. 1–11. Google Scholar - 3.
Mouse Genome Sequencing Consortium , Initial sequencing and comparative analysis of the mouse genome, Nature 420 :520–562, 2002. Crossref, Medline, Google Scholar - 4. , Mouse models of human disease. Part I: techniques and resources for genetic analysis in mice, Genes Dev 11 :1–10, 1997. Crossref, Medline, Google Scholar
- 5. ,
The mouse in biomedical research , in American College of Laboratory Animal Medicine, Vol. 3, Elsevier, Amsterdam, The Netherlands, 2007. Google Scholar - 6. , What is the optimal rodent model for anti-tumor drug testing?, Cancer Metastasis Rev 17 :301–304, 1998. Crossref, Medline, Google Scholar
- 7. , The mousetrap: what we can learn when the mouse model does not mimic the human disease, ILAR J 43 :66–79, 2002. Crossref, Medline, Google Scholar
- 8. , Hallmarks of Alzheimer’s disease in stem-cell-derived human neurons transplanted into mouse brain, Neuron 93 :1066–1081, 2017. Crossref, Medline, Google Scholar
- 9. , Molecular mechanisms of neurodegeneration in Alzheimer’s disease, Hum Mol Genet 19 :R12–R20, 2010. Crossref, Medline, Google Scholar
- 10. , Comparison of the transcriptional landscapes between human and mouse tissues, Proc Natl Acad Sci USA 111 :17224–17229, 2014. Crossref, Medline, Google Scholar
- 11. , Identification of functional genetic variants associated with alcohol dependence and related phenotypes using a high-throughput assay, Alcohol Clin Exp Res 44 :2494–2518, 2020. Crossref, Medline, Google Scholar
- 12. , Molecular mechanisms underlying noncoding risk variations in psychiatric genetic studies, Mol Psychiatry 22 :497–511, 2017. Crossref, Medline, Google Scholar
- 13. , Rare variant association testing in the non-coding genome, Hum Genet 139 :1345–1362, 2020. Crossref, Medline, Google Scholar
- 14. , Axon pruning: An essential step underlying the developmental plasticity of neuronal connections, Philos Trans R Soc Lond B Biol Sci 361 :1531–1544, 2006. Crossref, Medline, Google Scholar
- 15. , A competitive advantage by neonatally engrafted human glial progenitors yields mice whose brains are chimeric for human glia, J Neurosci 34 :16153–16161, 2014. Crossref, Medline, Google Scholar
- 16. , Human iPSC-derived mature microglia retain their identity and functionally integrate in the chimeric mouse brain, Nat Commun 11 :1–16, 2020. Medline, Google Scholar
- 17. , Reconstruction of brain circuitry by neural transplants generated from pluripotent stem cells, Neurobiol Dis 79 :28–40, 2015. Crossref, Medline, Google Scholar
- 18. , Human cerebral cortex development from pluripotent stem cells to functional excitatory synapses, Nat Neurosci 15 :477–486, 2012. Crossref, Medline, Google Scholar
- 19. , Increased nicotine response in iPSC-derived human neurons carrying the CHRNA5 N398 allele, Sci Rep 6 :34341, 2016. Crossref, Medline, Google Scholar
- 20. , Addiction associated N40D mu-opioid receptor variant modulates synaptic function in human neurons, Mol Psychiatry 25 :1406–1419, 2020. Crossref, Medline, Google Scholar
- 21. , Induction of human neuronal cells by defined transcription factors, Nature 476 :220–223, 2011. Crossref, Medline, Google Scholar
- 22. , The immune contexture in human tumours: Impact on clinical outcome, Nat Rev Cancer 12 :298–306, 2012. Crossref, Medline, Google Scholar
- 23. , Cellular composition of the human diabetic pancreas, Diabetologia 24 :366–371, 1983. Crossref, Medline, Google Scholar
- 24. , Gene expression distribution deconvolution in single-cell RNA sequencing, Proc Natl Acad Sci USA 115 :E6437–E6446, 2018. Medline, Google Scholar
- 25. , A critical survey of deconvolution methods for separating cell types in complex tissues, Proc IEEE 105 :340–366, 2016. Crossref, Google Scholar
- 26. , Robust enumeration of cell subsets from tissue expression profiles, Nat Methods 12 :453–457, 2015. Crossref, Medline, Google Scholar
- 27. , A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure, Cell Syst 3 :346–360, 2016. Crossref, Medline, Google Scholar
- 28. , Bulk tissue cell type deconvolution with multi-subject single-cell expression reference, Nat Commun 10 :1–9, 2019. Crossref, Medline, Google Scholar
- 29. , Comparative analysis of single-cell RNA sequencing methods, Mol Cell 65 :631–643, 2017. Crossref, Medline, Google Scholar
- 30. , Accurate estimation of cell composition in bulk expression through robust integration of single-cell information, Nat Commun 11 :1–11, 2020. Medline, Google Scholar
- 31. , Pattern recognition of partial discharge image based on one-dimensional convolutional neural network, Condition Monitoring and Diagnosis (CMD), pp. 1–4, 2018. Crossref, Google Scholar
- 32. Kalchbrenner N, Espeholt L, Simonyan K, Avd O, Graves A, Kavukcuoglu K, Neural machine translation in linear time, arXiv:1610.10099. Google Scholar
- 33. , Differential sensitivity of human neurons carrying mu opioid receptor (MOR) N40D variants in response to ethanol, Alcohol 87 :97–109, 2020. Crossref, Medline, Google Scholar
- 34. , Dysregulated protocadherin-pathway activity as an intrinsic defect in induced pluripotent stem cell-derived cortical interneurons from subjects with schizophrenia, Nat Neurosci 22 :229–242, 2019. Crossref, Medline, Google Scholar
- 35. , Differential nucleosome spacing in neurons and glia, Neurosci Lett 714 :134559, 2020. Crossref, Medline, Google Scholar
- 36. , Understanding of a convolutional neural network, Int Conf Engineering and Technology (ICET), pp. 1–6, 2017. Crossref, Google Scholar
- 37. , Learning and transferring mid-level image representations using convolutional neural networks, Proc IEEE Conf Computer Vision and Pattern Recognition, pp. 1717–1724, 2014. Crossref, Google Scholar
- 38. Tolias G, Sicre R, Jégou H, Particular object retrieval with integral max-pooling of CNN activations, arXiv:1511.05879. Google Scholar
- 39. , Max-pooling convolutional neural networks for vision-based hand gesture recognition, IEEE Int Conf Signal and Image Processing Applications (ICSIPA), pp. 342–347, 2011. Crossref, Google Scholar
- 40. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR, Improving neural networks by preventing co-adaptation of feature detectors, arXiv:1207.0580. Google Scholar
- 41. , Biased dropout and crossmap dropout: learning towards effective dropout regularization in convolutional neural network, Neural Netw 104 :60–67, 2018. Crossref, Medline, Google Scholar
- 42. , REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval, IEEE Trans Image Process 28 :5201–5213, 2019. Crossref, Google Scholar
- 43. , Residual dense network for image super-resolution, Proc IEEE Conf Computer Vision and Pattern Recognition, pp. 2472–2481, 2018. Crossref, Google Scholar
- 44. , Near-optimal probabilistic RNA-seq quantification, Nat Biotechnol 34 :525–527, 2016. Crossref, Medline, Google Scholar
- 45. , Profile hidden Markov models, Bioinformatics 14 :755–763, 1998. Crossref, Medline, Google Scholar
- 46. , Automatic generation of gene finders for eukaryotic species, BMC Bioinformatics 7 :1–12, 2006. Crossref, Medline, Google Scholar
- 47. , nhmmer: DNA homology search with profile HMMs, Bioinformatics 29 :2487–2489, 2013. Crossref, Medline, Google Scholar
- 48. , The Pfam protein families database: Towards a more sustainable future, Nucleic Acids Res 44, D279–5285, 2016. Crossref, Medline, Google Scholar
- 49. , HMMER web server: Interactive sequence similarity searching, Nucleic Acids Res 39 :W29–W37, 2011. Crossref, Medline, Google Scholar
- 50. , Higher-order Markov models for metagenomic sequence classification, Bioinformatics 36 :4130–4136, 2020. Crossref, Medline, Google Scholar
- 51. , Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models, Circulation 115 :654–657, 2007. Crossref, Medline, Google Scholar
- 52. , Application of the residue number system to reduce hardware costs of the convolutional neural network implementation, Math Comput Simul 177 :232–243, 2020. Crossref, Google Scholar
- 53. , Hierarchical Neural Networks for Image Interpretation, Springer, Berlin, Heidelberg, Germany, 2003. Crossref, Google Scholar
- 54. Zhang Y, Wallace B, A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification, arXiv:1510.03820. Google Scholar
- 55. , Learning discriminative projections for text similarity measures, Proc Fifteenth Conf Computational Natural Language Learning, pp. 247–256, 2011. Google Scholar
- 56. , Learning semantic representations using convolutional neural networks for web search, Proc 23rd Int Conf World Wide Web, pp. 373–374, 2014. Crossref, Google Scholar
- 57. , A convolutional neural network for modelling sentences, Proc 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1, pp. 655–665, 2014. Crossref, Google Scholar
- 58. , Natural language processing (almost) from scratch, J Mach Learn Res 12 :2493–2537, 2011. Google Scholar
- 59. , Accelerating next generation sequencing data analysis with system level optimizations, Sci Rep 7 :9058, 2017. Crossref, Medline, Google Scholar
- 60. , On the length, weight and GC content of the human genome, BMC Res Notes 12 :106, 2019. Crossref, Medline, Google Scholar
- 61. ,
Inducing alterations in the mammalian genome for investigating the functions of genes , in Mammalian Genomics, eds. Ruvinsky A, Marshall Graves J, CABI Publishing, Cambridge, pp. 221–262, 2005. Crossref, Google Scholar - 62. , Base-calling of automated sequencer traces usingPhred. I. Accuracy assessment, Genome Res 8 :175–185, 1998. Crossref, Medline, Google Scholar
- 63. , Preventing model overfitting and underfitting in convolutional neural networks, Int J Softw Sci Comput Intell 10 :19–28, 2018. Crossref, Google Scholar
- 64. , Performance comparison of pretrained convolutional neural networks on crack detection in buildings, Proc Int Symp Automation and Robotics in Construction, pp. 1–8, 2018. Crossref, Google Scholar
- 65. , Study and observation of the variations of accuracies for handwritten digits recognition with various hidden layers and epochs using convolutional neural network, 4th Int Conf Electrical Engineering and Information & Communication Technology (iCEEiCT), pp. 112–117, 2018. Crossref, Google Scholar
- 66. , Joint classification and prediction cnn framework for automatic sleep stage classification, IEEE Trans Biomed Eng 66 :1285–1296, 2019. Crossref, Medline, Google Scholar
- 67. Zhu X, Bain M, B-CNN: Branch convolutional neural network for hierarchical classification, arXiv:1709.09890. Google Scholar
- 68. , Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy, Visual Comput 36 :405–412, 2020. Crossref, Google Scholar
- 69. , Speech emotion recognition using CNN, Proc 22nd ACM Int Conf Multimedia, pp. 801–804, 2014. Crossref, Google Scholar
- 70. , Interpretation of intelligence in CNN-pooling processes: A methodological survey, Neural Comput Appl 32 :879–898, 2020. Crossref, Google Scholar
- 71. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M, Striving for simplicity: The all convolutional net, arXiv:1412.6806. Google Scholar
- 72. , Multi-scale orderless pooling of deep convolutional activation features, European Conf Computer Vision, pp. 392–407, 2014. Crossref, Google Scholar
- 73. , A convolutional neural network with dynamic correlation pooling, 13th Int Conf Computational Intelligence and Security (CIS), pp. 496–499, 2017. Crossref, Google Scholar
- 74. , A discriminative CNN video representation for event detection, Proc IEEE Conf Computer Vision and Pattern Recognition, pp. 1798–1807, 2015. Crossref, Google Scholar
- 75. Koushik J, Hayashi H, Improving stochastic gradient descent with feedback, 2016, arXiv:16.11.01505, https://arxiv.org/abs/16.11.01505. Google Scholar
- 76. Kingma DP, Ba J, Adam: A method for stochastic optimization, arXiv:1412.6980. Google Scholar
- 77. Reddi SJ, Kale S, Kumar S, On the convergence of adam and beyond, arXiv:1904.09237. Google Scholar