EXACT SOLUTIONS FOR SPECIES TREE INFERENCE FROM DISCORDANT GENE TREES
Abstract
Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.
References
- J. Biochem. 129(4), 615 (2001). Crossref, Medline, Google Scholar
- Biochimica et Biophysica Acta (BBA) — Proteins and Proteomics 1702(1), 111 (2004). Crossref, Google Scholar
- FEBS. J. 274(2), 512 (2007). Crossref, Medline, Google Scholar
- Proc. R Soc. London, Ser. B 269(1500), 1555 (2002). Crossref, Medline, Google Scholar
- Syst. Biol. 28(2), 132 (1979). Crossref, Google Scholar
- Comp. Genom. 1 , 525 ( 2000 ) . Crossref, Google Scholar
-
R. Page and E. Holmes , Molecular Evolution: A Phylogenetic Approach ( Wiley-Blackwell , 1998 ) . Google Scholar - Syst. Biol. 48(4), 814 (1999). Crossref, Medline, Google Scholar
J. G. Burleigh , Inferring species trees from gene duplication episodes, Proc First ACM Int Conf Bioinformatics and Computational Biology (ACM, 2010) pp. 198–203. Google ScholarB. Ma , M. Li and L. Zhang , On reconstructing species trees from gene trees in term of duplications and losses, Proc. RECOMB 98 (ACM, 1998) pp. 182–191. Google Scholar- IEEE/ACM Trans Comput. Biol. Bioinf. 8(6), 1685 (2011). Crossref, Medline, Google Scholar
- BMC. Bioinf. 11 ( ) , S42 ( 2010 ) . Crossref, Medline, Google Scholar
- , Research in Computational Molecular Biology, eds.
T. Speed and H. Huang (Springer, Berlin/Heidelberg, 2007) pp. 238–252. Crossref, Google Scholar - IEEE/ACM Trans Comput. Biol. Bioinf. 5(4), 514 (2008). Crossref, Medline, Google Scholar
- IEEE/ACM Trans Comput. Biol. Bioinf. 6(2), 221 (2009). Crossref, Medline, Google Scholar
- BMC. Bioinf. 11(1), 574 (2010). Crossref, Medline, Google Scholar
P. Górecki , J. G. Burleigh and O. Eulenstein , GTP supertrees from unrooted gene trees: Linear time algorithms for NNI based local searches, Bioinformatics Research and Applications, ISBRA 20127292,Lecture Notes in Computer Science , eds.L. F. Bleris (Springer, 2012) pp. 102–114. Google Scholar- Syst. Biol. 55(1), 21 (2006). Crossref, Medline, Google Scholar
- Bioinformatics 24(13), i123 (2008). Crossref, Medline, Google Scholar
- Bioinformatics 24(13), 1540 (2008). Crossref, Medline, Google Scholar
- Algorithms. Mol. Biol. 5 , 18 ( 2010 ) , DOI: 10.1186/1748-7188-5-18 . Crossref, Medline, Google Scholar
- Math. Biosci. 53(2), 131 (1981). Crossref, Google Scholar
- , New Approaches in Classification and Data Analysis, eds.
E. Diday (Springer, Berlin, Heidelberg, 1994) pp. 136–140. Crossref, Google Scholar - IEEE/ACM Trans Comput. Biol. Bioinf. 3(2), 141 (2006). Crossref, Medline, Google Scholar
M. Chimani , S. Rahmann and S. Böcker , Exact ILP solutions for phylogenetic minimum flip problems, Proc. ACM-BCB 2010 (ACM, 2010) pp. 147–153. Google Scholar- Algorithms. Mol. Biol. 5 , 2 ( 2010 ) . Crossref, Medline, Google Scholar
- J. Comput. Biol. 17(3), 383 (2010). Crossref, Medline, Google Scholar
- , Computing and Combinatorics,
Lecture Notes in Computer Science 4598, ed.G. Lin (Springer, 2007) pp. 51–64. Crossref, Google Scholar S. Sridhar , Efficiently finding the most parsimonious phylogenetic tree via linear programming, Proc. 3rd Int. Conf. Bioinformatics Research and Applications, ISBRA 2007 (Springer-Verlag, 2007) pp. 37–48. Google Scholar- BMC. Bioinf. 12 ( ) , S14 ( 2011 ) . Crossref, Medline, Google Scholar
- BMC. Evol. Biol. 7 ( ) , S3 ( 2007 ) . Crossref, Medline, Google Scholar
- PLoS. Comput. Biol. 5(9), e1000501 (2009). Crossref, Medline, Google Scholar
J. Doyon and C. Chauve , Software Tools and Algorithms for Biological Systems (Springer, New York, 2011) pp. 287–295. Crossref, Google ScholarA. Wehe , J. G. Burleigh and O. Eulenstein , Algorithms for knowledge-enhanced supertrees, Proc. ISBRA 2012 (2012) pp. 263–274. Google Scholar- Syst. Biol. 46 , 523 ( 1997 ) . Crossref, Google Scholar
- J. Class. 2(1), 7 (1985). Crossref, Google Scholar
- J. Comput. Biol. 14(6), 724 (2007). Crossref, Medline, Google Scholar
- IEEE/ACM Trans. Comput. Biol. Bioinform. 10(2), 522 (2013). Crossref, Medline, Google Scholar
- Bioinformatics 23(2), e116 (2007). Crossref, Medline, Google Scholar
- SIAM. J. Comput. 30(3), 729 (2001). Crossref, Google Scholar
-
D. Knuth , The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations ( Addison-Wesley Professional , Boston , 2005 ) . Google Scholar - Syst. Biol. 56(3), 445 (2007). Crossref, Medline, Google Scholar
- ACM. Comput. Surv. 24(2), 195 (1992). Crossref, Google Scholar
- Bioinformatics 26(12), 1569 (2013). Crossref, Google Scholar
- Mol. Phylogenet. Evol. 6(2), 189 (1996). Crossref, Medline, Google Scholar
- Math. Hierar. Biol. 37 , 57 ( 1997 ) . Crossref, Google Scholar
- Mol. Phylogenet. Evol. 14(1), 89 (2000). Crossref, Medline, Google Scholar
- Syst. Biol. 51(4), 570 (2002). Crossref, Medline, Google Scholar
- Syst. Biol. 57(4), 574 (2008). Crossref, Medline, Google Scholar


