The potential of family-free rearrangements towards gene orthology inference
Abstract
Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.
References
- 1. , Distinguishing homologous from analogous proteins, Syst Zool 19 :99–113, 1970. Crossref, Medline, Google Scholar
- 2. , A unifying view of genome rearrangements, Proc. Int. Workshop on Algorithms in Bioinformatics (WABI),
Lecture Notes in Bioinformatics , Vol. 4175, pp. 163–173, 2006. Crossref, Google Scholar - 3. , Transforming men into mice (polynomial algorithm for genomic distance problem), Proc. IEEE 36th Annual Foundations of Computer Science, pp. 581–592, 1995. Crossref, Google Scholar
- 4. , Efficient sorting of genomic permutations by translocation, inversion and block interchange, Bioinformatics 21(16) :3340–3346, 2005. Crossref, Medline, Google Scholar
- 5. , Double cut and join with insertions and deletions, J Comput Biol 18(9) :1167–1184, 2011. Crossref, Medline, Google Scholar
- 6. , Genome rearrangement with gene families, Bioinformatics 15(11) :909–917, 1999. Crossref, Medline, Google Scholar
- 7. ,
The complexity of calculating exemplar distances , in Sankoff D, Nadeau JH (eds.), Comparative Genomics, Springer, pp. 207–211, 2000. Crossref, Google Scholar - 8. , On the approximability of comparing genomes with duplicates, J Graph Algorithms Appl 13(1) :19–53, 2009. Crossref, Google Scholar
- 9. , An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes, J Comput Biol 22(5) :425–435, 2015. Crossref, Medline, Google Scholar
- 10. , Computing the rearrangement distance of natural genomes, J Comput Biol 28(4) :410–431, 2021. Crossref, Medline, Google Scholar
- 11. , MSOAR 2.0: Incorporating tandem duplications into ortholog assignment based on genome rearrangement, BMC Bioinf 11 :10, 2010. Crossref, Medline, Google Scholar
- 12. ,
The potential of family-free genome comparison , in Chauve C, El-Mabrouk N, Tannier E (eds.), Models and Algorithms for Genome Evolution, Chap. 13, Springer-Verlag, pp. 287–307, 2013. Crossref, Google Scholar - 13. , On the family-free DCJ distance and similarity, Algorithms Mol Biol 10 :13, 2015. Crossref, Medline, Google Scholar
- 14. , Natural family-free genomic distance, Algorithms Mol Biol 16 :4, 2021. Medline, Google Scholar
- 15. , OMA, a comprehensive, automated project for the identification of orthologs from complete genome data: Introduction and first achievements, Proc. RECOMB Workshop on Comparative Genomics (RECOMB-CG),
Lecture Notes in Bioinformatics , Vol. 3678, pp. 61–72, 2005. Crossref, Google Scholar - 16. , Algorithm of OMA for large-scale orthology inference, BMC Bioinf 9 :518, 2008. Crossref, Medline, Google Scholar
- 17. , Proteinortho: Detection of (co-)orthologs in large-scale analysis, BMC Bioinf 12 :124, 2011. Crossref, Medline, Google Scholar
- 18. , Orthology detection combining clustering and synteny for very large datasets, PLoS One 9(8) :e105015, 2014. Crossref, Medline, Google Scholar
- 19. , Gene family assignment-free comparative genomics, BMC Bioinf 13(Suppl. 19) :S3, 2012. Crossref, Medline, Google Scholar
- 20. van Dongen S, Graph clustering by flow simulation, PhD Thesis, 2000. Google Scholar
- 21. , Ortholog clustering on a multipartite graph, IEEE/ACM Trans Comput Biol Bioinf 4(1) :17–27, 2007. Crossref, Medline, Google Scholar
- 22. ,
Family-free genome comparison , in Setubal JC, Stoye J, Stadler PF (eds.), Comparative Genomics: Methods and Protocols, Humana Press, pp. 331–342, 2018. Crossref, Google Scholar - 23. , Basic local alignment search tool, J Mol Biol 215(3) :403–410, 1990. Crossref, Medline, Google Scholar
- 24. , MultiMSOAR 2.0: An accurate tool to identify ortholog groups among multiple genomes, PLoS One 6(6) :e20892, 2011. Crossref, Medline, Google Scholar
- 25. , OMA standalone: Orthology inference among public and custom genomes and transcriptomes, Genome Res 29(7) :1152–1163, 2019. Crossref, Medline, Google Scholar
- 26. , Zombi: A phylogenetic simulator of trees, genomes and sequences that accounts for dead linages, Bioinformatics 36(4) :1286–1288, 2019. Crossref, Google Scholar
- 27. , A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach, Mol Biol Evol 18(5) :691–699, 2001. Crossref, Medline, Google Scholar
- 28. , Fast and sensitive protein alignment using DIAMOND, Nat Methods 12 :59–60, 2015. Crossref, Medline, Google Scholar