Small parsimony for natural genomes in the DCJ-indel model
Abstract
The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.
The complete paper is available at https://arxiv.org/abs/2108.04297.
References
- 1. , The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution, Nature 50 :270–277, 2018. https://doi.org/10.1038/s41588-017-0036-1 Google Scholar
- 2. , Karyotype and gene order evolution from reconstructed extinct ancestors highlight contrasts in genome plasticity of modern rosid crops, Genome Biol Evol 7(3) :735–749, 2015. https://doi.org/10.1093/gbe/evv014 Crossref, Medline, Google Scholar
- 3. , The pineapple genome and the evolution of CAM photosynthesis, Nat Genome 47(12) :1435–1442, 2015. https://doi.org/10.1038/ng.3435 Crossref, Medline, Google Scholar
- 4. , Highly evolvable malaria vectors: The genome of 16 Anopheles mosquitoes, Science 347(6217) :1258522, 2015. https://doi.org/10.1126/science.1258522 Crossref, Medline, Google Scholar
- 5. , Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies, BMC Biol 18(1) :1, 2020. https://doi.org/10.1186/s12915-019-0728-3 Crossref, Medline, Google Scholar
- 6. , Phylogenetic signal from rearrangements in 18 Anopheles species by joint scaffolding extant and ancestral genomes, BMC Genome 19 :96, 2018. https://doi.org/10.1186/s12864-018-4466-7 Crossref, Medline, Google Scholar
- 7. , Reconstructing ancestral gene orders using conserved intervals, Jonassen I, Kim J (eds.), Algorithms in Bioinformatics, 4th Int Workshop Proc,
Lecture Notes in Computer Science , Vol. 3240, Springer, pp. 14–25, 2004. https://doi.org/10.1007/978-3-540-30219-3_2 Crossref, Google Scholar - 8. , Reconstructing contiguous regions of an ancestral genome, Genome Res 16 :1557–1565, 2006. Crossref, Medline, Google Scholar
- 9. , A methodological framework for the reconstruction of contiguous regions of ancestral genomes and its application to mammalian genomes, PLoS Comput Biol 4(11):e1000234, 2008. https://doi.org/10.1371/journal.pcbi.1000234 Crossref, Medline, Google Scholar
- 10. , Combinatorics of Genome Rearrangements, MIT Press, 2009. Crossref, Google Scholar
- 11. , The median problems for breakpoints are NP-complete, Electron Colloq Comput Compl (ECCC) 5(71):1–16, 1998. Google Scholar
- 12. , The reversal median problem, INFORMS J Comput 15(1) :93–113, 2003. https://doi.org/10.1287/ijoc.15.1.93.15155 Crossref, Google Scholar
- 13. , Multichromosomal median and halving problems under different genomic distances, BMC Bioinf 10, 2009. https://doi.org/10.1186/1471-2105-10-120 Crossref, Medline, Google Scholar
- 14. , On the complexity of rearrangement problems under the breakpoint distance, J Comput Biol 21(1) :1–15, 2014. https://doi.org/10.1089/cmb.2013.0004 Crossref, Medline, Google Scholar
- 15. , SCJ: A breakpoint-like distance that simplifies several rearrangement problems, IEEE/ACM Trans Comput Biol Bioinf 8(5) :1318–1329, 2011. https://doi.org/10.1109/TCBB.2011.34 Crossref, Medline, Google Scholar
- 16. , The scj small parsimony problem for weighted gene adjacencies, IEEE/ACM Trans Comput Biol Bioinf 16(5) :1364–1373, 2019. https://doi.org/10.1109/TCBB.2017.2661761 Crossref, Medline, Google Scholar
- 17. , Comparing genomes with duplications: A computational complexity point of view, IEEE/ACM Trans Comput Biol Bioinf 4(4) :523–534, 2007. https://doi.org/10.1145/1322075.1322079 Crossref, Medline, Google Scholar
- 18. , On the approximability of comparing genomes with duplicates, J Graph Algorithms Appl 13(1) :19–53, 2009. Crossref, Google Scholar
- 19. ,
Duplication, rearrangement, and reconciliation , in Comparative Genomics, Springer, Netherlands, pp. 537–550, 2000. https://doi.org/10.1007/978-94-011-4309-7_46 Crossref, Google Scholar - 20. ,
Duplication, rearrangement and reconciliation: A follow-up 13 years later , in Models and Algorithms for Genome Evolution, Springer, London, pp. 47–62, 2013. https://doi.org/10.1007/978-1-4471-5298-9_4 Crossref, Google Scholar - 21. , DUPCAR: Reconstructing contiguous ancestral regions with duplications, J Comput Biol 15(8) :1007–1027, 2008. https://doi.org/10.1089/cmb.2008.0069 Crossref, Medline, Google Scholar
- 22. , DeCoSTAR: Reconstructing the ancestral organization of genes or genomes using reconciled phylogenies, Genome Biol Evol 9(5) :1312–1319, 2017. https://doi.org/10.1093/gbe/evx069 Crossref, Medline, Google Scholar
- 23. , Reversing gene erosion - reconstructing ancestral bacterial genomes from gene-content and order data, Algorithms in Bioinformatics, 4th Int Workshop, Proc,
Lecture Notes in Computer Science , Vol. 3240, Springer, Berlin, Heidelberg, pp. 1–13, 2004. https://doi.org/10.1007/978-3-540-30219-3_1 Crossref, Google Scholar - 24. , A flexible ancestral genome reconstruction method based on gapped adjacencies, BMC Bioinf 13(S-19) :S4, 2012. https://doi.org/10.1186/1471-2105-13-S19-S4 Crossref, Medline, Google Scholar
- 25. , Reconstruction of ancestral gene orders using probabilistic and gene encoding approaches, PloS One 9(10) :1–11, 2014. https://doi.org/10.1371/journal.pone.0108796 Crossref, Google Scholar
- 26. , Ancestral reconstruction with duplications using binary encoding and probabilistic model, 7th Int Conf Bioinformatics and Computational Biology (BICoB 2015), Honolulu, HI, USA, pp. 97–104, 2015. Google Scholar
- 27. , Reconstructing ancestral gene orders with duplications guided by synteny level genome reconstruction, BMC Bioinf 17(S-14) :201–212, 2016. https://doi.org/10.1186/s12859-016-1262-8 Medline, Google Scholar
- 28. , The distance and median problems in the single-cut-or-join model with single-gene duplications, Algorithms Molecular Biol 15(1) :8, 2020. https://doi.org/10.1186/s13015-020-00169-y Crossref, Medline, Google Scholar
- 29. Mane AC, Genome Rearrangement Problems Accounting for Duplicate Genes, Master’s Thesis, Simon Fraser University, Department of Mathematics, 2018, https://summit.sfu.ca/item/18563. Google Scholar
- 30. , Efficient sorting of genomic permutations by translocation, inversion and block interchange, Bioinformatics 21(16) :3340–3346, 2005. https://doi.org/10.1093/bioinformatics/bti535 Crossref, Medline, Google Scholar
- 31. , A unifying view of genome rearrangements, Bucher P, Moret BME (eds.), Algorithms in Bioinformatics, 6th Int Workshop Proc,
Lecture Notes in Computer Science , Vol. 4175, Springer, pp. 163–173, 2006. https://doi.org/10.1007/11851561_16 Crossref, Google Scholar - 32. , An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes, J Comput Biol 22(5) :425–435, 2015. https://doi.org/10.1089/cmb.2014.0096 Crossref, Medline, Google Scholar
- 33. , Computing the rearrangement distance of natural genomes, Research in Computational Molecular Biology - 24th Annual Int Conf Proc,
Lecture Notes in Computer Science , Vol. 12074, Springer, pp. 3–18, 2020. https://doi.org/10.1007/978-3-030-45257-5_1 Crossref, Google Scholar - 34. , Linearization of median genomes under the double-cut-and-join-indel model, Evolut Bioinf 15:1176934318820534, 2019. https://doi.org/10.1177/1176934318820534 Medline, Google Scholar
- 35. , Linearization of ancestral genomes with duplicated genes, BCB ’20: 11th ACM Int Conf Bioinformatics, Computational Biology and Health Informatics, Virtual Event, ACM, pp. 53:1–53:10, 2020. https://doi.org/10.1145/3388440.3412484 Crossref, Google Scholar
- 36. ,
Ancestral genome organization as a diagnosis tool for phylogenomics , Scornavacca C, Delsuc F, Galtier N (eds.), in Phylogenetics in the Genomic Era,No commercial publisher — Authors open access book , pp. 2.5:1–2.5:19, 2020. Google Scholar - 37. , Double cut and join with insertions and deletions, J Comput Biol 18(9) :1167–1184, 2011. https://doi.org/10.1089/cmb.2011.0118 Crossref, Medline, Google Scholar
- 38. , On computing breakpoint distances for genomes with duplicate genes, J Comput Biol 24(6) :571–580, 2017. https://doi.org/10.1089/cmb.2016.0149 Crossref, Medline, Google Scholar
- 39. Gurobi Optimizer, https://www.gurobi.com/products/gurobi-optimizer. Google Scholar
- 40. , Zombi: A phylogenetic simulator of trees, genomes and sequences that accounts for dead linages, Bioinformatics 36(4) :1286–1288, 2019. https://doi.org/10.1093/bioinformatics/btz710 Crossref, Google Scholar