World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Mon, Jun 21st, 2021 at 1am (EDT)

During this period, the E-commerce and registration of new users may not be available for up to 6 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Measuring consistency among gene set analysis methods: A systematic study

    This article is part of the issue:

    Gene set analysis is a quantitative approach for generating biological insight from gene expression datasets. The abundance of gene set analysis methods speaks to their popularity, but raises the question of the extent to which results are affected by the choice of method. Our systematic analysis of 13 popular methods using 6 different datasets, from both DNA microarray and RNA-Seq origin, shows that this choice matters a great deal. We observed that the overall number of gene sets reported by each method differed by up to 2 orders of magnitude, and there was a bias toward reporting large gene sets with some methods. Furthermore, there was substantial disagreement between the 20 most statistically significant gene sets reported by the methods. This was also observed when expanding to the 100 most statistically significant reported gene sets. For different datasets of the same phenotype/condition, the top 20 and top 100 most significant results also showed little to no agreement even when using the same method. GAGE, PAGE, and ORA were the only methods able to achieve relatively high reproducibility when comparing the 20 and 100 most statistically significant gene sets. Biological validation on a juvenile idiopathic arthritis (JIA) dataset showed wide variation in terms of the relevance of the top 20 and top 100 most significant gene sets to known biology of the disease, where GAGE predicted the most relevant gene sets, followed by GSEA, ORA, and PAGE.

    References

    • 1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al., Gene ontology: Tool for the unification of biology, Nat. Genet. 25(1) :25–29, 2000. Crossref, MedlineGoogle Scholar
    • 2. Kanehisa M, Goto S , KEGG: Kyoto encyclopedia of genes and genomes, Nucleic Acids Res 28(1) :27–30, 2000. Crossref, MedlineGoogle Scholar
    • 3. Nishimura D , BioCarta, Biotech Softw Internet Rep 2(3) :117–120, 2001. CrossrefGoogle Scholar
    • 4. Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu G, Caudy M, Garapati P, Gillespie M, Kamdar MR et al., The Reactome pathway knowledgebase, Nucleic Acids Res 42(D1) :D472–D477, 2013. Crossref, MedlineGoogle Scholar
    • 5. Barbie DA, Tamayo P, Boehm JS, Kim SY, Moody SE, Dunn IF, Schinzel AC, Sandy P, Meylan E, Scholl C et al., Systematic RNA interference reveals that oncogenic kras-driven cancers require tbk1, Nature 462(7269) :108, 2009. Crossref, MedlineGoogle Scholar
    • 6. Goeman JJ, Van De Geer SA, De Kort F, Van Houwelingen HC , A global test for groups of genes: Testing association with a clinical outcome, Bioinformatics 20(1) :93–99, 2004. Crossref, MedlineGoogle Scholar
    • 7. Hänzelmann S, Castelo R, Guinney J , GSVA: Gene set variation analysis for microarray and RNA-seq data, BMC Bioinf 14(1) :7, 2013. Crossref, MedlineGoogle Scholar
    • 8. Kim SY, Volsky DJ , PAGE: Parametric analysis of gene set enrichment, BMC Bioinf 6(1) :144, 2005. Crossref, MedlineGoogle Scholar
    • 9. Luo W, Friedman MS, Shedden K, Hankenson KD, Woolf PJ , GAGE: Generally applicable gene set enrichment for pathway analysis, BMC Bioinf 10(1) :161, 2009. Crossref, MedlineGoogle Scholar
    • 10. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES et al., Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles, Proc Natl Acad Sci USA 102(43) :15545–15550, 2005. Crossref, MedlineGoogle Scholar
    • 11. Tarca AL, Draghici S, Bhatti G, Romero R , Down-weighting overlapping genes improves gene set analysis, BMC Bioinf 13(1) :136, 2012. Crossref, MedlineGoogle Scholar
    • 12. Tomfohr J, Lu J, Kepler TB , Pathway level analysis of gene expression using singular value decomposition, BMC Bioinf 6(1) :225, 2005. Crossref, MedlineGoogle Scholar
    • 13. Wu D, Lim E, Vaillant F, Asselin-Labat ML, Visvader JE, Smyth GK , ROAST: Rotation gene set tests for complex microarray experiments, Bioinformatics 26(17) :2176–2182, 2010. Crossref, MedlineGoogle Scholar
    • 14. Wu D, Smyth GK , Camera: A competitive gene set test accounting for inter-gene correlation, Nucleic Acids Res 40(17) :e133, 2012. Crossref, MedlineGoogle Scholar
    • 15. Maleki F, Ovens K, McQuillan I, Kusalik AJ , Sample size and reproducibility of gene set analysis, 2018 IEEE Int Conf Bioinformatics and Biomedicine (BIBM 2018), IEEE, pp. 122–129, 2018. CrossrefGoogle Scholar
    • 16. Maleki F, Ovens KL, Rezaei E, Rosenberg AM, Kusalik AJ , Method choice in gene set analysis has important consequences for analysis outcome, Proc 12th Int Joint Conf Biomedical Engineering Systems and Technologies (BIOSTEC), pp. 43–54, 2019. CrossrefGoogle Scholar
    • 17. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK , Limma powers differential expression analyses for RNA-sequencing and microarray studies, Nucleic Acids Res 43(7) :e47–e47, 2015. Crossref, MedlineGoogle Scholar
    • 18. Drăghici S , Statistics and Data Analysis for Microarrays Using R and Bioconductor, CRC Press, 2016. ISBN 978-14-39809-75-4. Google Scholar
    • 19. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R , NCBI GEO: Mining millions of expression profiles — database and tools, Nucleic Acids Res 33(suppl 1) :D562–D566, 2005. MedlineGoogle Scholar
    • 20. Von Roemeling CA, Radisky DC, Marlow LA, Cooper SJ, Grebe SK, Anastasiadis PZ, Tun HW, Copland JA , Neuronal pentraxin 2 supports clear cell renal cell carcinoma by activating the ampa-selective glutamate receptor-4, Cancer Res 74(17) :4796–4810, 2014. Crossref, MedlineGoogle Scholar
    • 21. Swindell WR, Johnston A, Carbajal S, Han G, Wohn C, Lu J, Xing X, Nair RP, Voorhees JJ, Elder JT et al., Genome-wide expression profiling of five mouse models identifies similarities and differences with human psoriasis, Plos One 6(4) :e18266, 2011. Crossref, MedlineGoogle Scholar
    • 22. Yao Y, Richman L, Morehouse C, De Los Reyes M, Higgs BW, Boutrin A, White B, Coyle A, Krueger J, Kiener PA et al., Type I interferon: Potential therapeutic target for psoriasis? Plos One 3(7) :e2737, 2008. Crossref, MedlineGoogle Scholar
    • 23. Demmer RT, Behle JH, Wolf DL, Handfield M, Kebschull M, Celenti R, Pavlidis P, Papapanou PN , Transcriptomes in healthy and diseased gingival tissues, J Periodontol 79(11) :2112–2124, 2008. Crossref, MedlineGoogle Scholar
    • 24. Thompson SD, Marion MC, Sudman M, Ryan M, Tsoras M, Howard TD, Barnes MG, Ramos PS, Thomson W, Hinks A et al., Genome-wide association analysis of juvenile idiopathic arthritis identifies a new susceptibility locus at chromosomal region 3q13, Arthritis Rheum 64(8) :2781–2791, 2012. Crossref, MedlineGoogle Scholar
    • 25. Li B, Tsoi LC, Swindell WR, Gudjonsson JE, Tejasvi T, Johnston A, Ding J, Stuart PE, Xing X, Kochkodan JJ et al., Transcriptome analysis of psoriasis in a large case–control sample: RNA-seq provides insights into disease mechanisms, J Invest Dermatol 134(7) :1828–1838, 2014. Crossref, MedlineGoogle Scholar
    • 26. Davis S, Meltzer PS , Geoquery: A bridge between the gene expression omnibus (geo) and bioconductor, Bioinformatics 23(14) :1846–1847, 2007. Crossref, MedlineGoogle Scholar
    • 27. Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP , Summaries of affymetrix genechip probe level data, Nucleic Acids Res 31(4) :e15–e15, 2003. Crossref, MedlineGoogle Scholar
    • 28. Gautier L, Cope L, Bolstad BM, Irizarry RA , affy — analysis of affymetrix genechip data at the probe level, Bioinformatics 20(3) :307–315, 2004. Crossref, MedlineGoogle Scholar
    • 29. Neely BA, Anderson PE , Complementary domain prioritization: A method to improve biologically relevant detection in multi-omic data sets, Proc 10th Int Joint Conf Biomedical Engineering Systems and Technologies, Volume 3: BIOINFORMATICS (BIOSTEC 2017), INSTICC, SciTePress, pp. 68–80, 2017, https://doi.org/10.5220/0006151500680080. CrossrefGoogle Scholar
    • 30. West S, Ali H , Sensitivity analysis of granularity levels in complex biological networks, in Fred AGamboa H (eds.), Biomedical Engineering Systems and Technologies, Springer International Publishing, pp. 167–188, 2017. ISBN 978-3-319-54717-6. CrossrefGoogle Scholar
    • 31. Zyla J, Marczyk M, Polanska J , Sensitivity, specificity and prioritization of gene set analysis when applying different ranking metrics, 10th Int Conf Practical Applications of Computational Biology & Bioinformatics, Springer, pp. 61–69, 2016. CrossrefGoogle Scholar
    • 32. Bolger AM, Lohse M, Usadel B , Trimmomatic: A flexible trimmer for illumina sequence data, Bioinformatics 30(15) :2114–2120, 2014. Crossref, MedlineGoogle Scholar
    • 33. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR , Star: Ultrafast universal RNA-seq aligner, Bioinformatics 29(1) :15–21, 2013. Crossref, MedlineGoogle Scholar
    • 34. Benjamini Y, Hochberg Y , Controlling the false discovery rate: A practical and powerful approach to multiple testing, J Roy Stat Soc 57(1) :289–300, 1995. Google Scholar
    • 35. Petty RE, Southwood TR, Manners P, Baum J, Glass DN, Goldenberg J, He X, Maldonado-Cocco J, Orozco-Alcala J, Prieur AM et al., International league of associations for rheumatology classification of juvenile idiopathic arthritis: second revision, edmonton, 2001, J Rheum 31(2) :390, 2004. MedlineGoogle Scholar
    • 36. Barnes MG, Grom AA, Thompson SD, Griffin TA, Pavlidis P, Itert L, Fall N, Sowders DP, Hinze CH, Aronow BJ et al., Subtype-specific peripheral blood gene expression profiles in recent-onset juvenile idiopathic arthritis, Arthritis Rheum 60(7) :2102–2112, 2009. Crossref, MedlineGoogle Scholar
    • 37. Fung E, Smyth D, Howson J, Cooper J, Walker N, Stevens H, Wicker L, Todd J , Analysis of 17 autoimmune disease-associated variants in type 1 diabetes identifies 6q23/tnfaip3 as a susceptibility locus, Genes Immun 10(2) :188, 2009. Crossref, MedlineGoogle Scholar
    • 38. Martinez A, Varade J, Marquez A, Cenit M, Espino L, Perdigones N, Santiago J, Fernández-Arquero M, De La Calle H, Arroyo R et al., Association of the STAT4 gene with increased susceptibility for some immune-mediated diseases, Arthritis Rheum 58(9) :2598–2602, 2008. Crossref, MedlineGoogle Scholar
    • 39. Phelan J, Thompson S, Glass D , Susceptibility to jra/jia: Complementing general autoimmune and arthritis traits, Genes Immun 7(1) :1, 2006. Crossref, MedlineGoogle Scholar
    • 40. Prahalad S , Genetics of juvenile idiopathic arthritis: An update, Curr Opin Rheumatol 16(5) :588–594, 2004. Crossref, MedlineGoogle Scholar
    • 41. Prahalad S, Glass DN , A comprehensive review of the genetics of juvenile idiopathic arthritis, Pediatr Rheumat 6(1) :11, 2008. Crossref, MedlineGoogle Scholar
    • 42. Yao TC, Tsai YC, Huang JL , Association of rantes promoter polymorphism with juvenile rheumatoid arthritis, Arthritis Rheum 60(4) :1173–1178, 2009. Crossref, MedlineGoogle Scholar
    • 43. Jorde L , Linkage disequilibrium and the search for complex disease genes, Genome Res 10(10) :1435–1444, 2000. Crossref, MedlineGoogle Scholar
    • 44. Prahalad S , Genetic analysis of juvenile rheumatoid arthritis: Approaches to complex traits, Curr Probl Pediatr Adolesc Health Care 36(3) :83, 2006. Crossref, MedlineGoogle Scholar
    • 45. Prahalad S, Ryan MH, Shear ES, Thompson SD, Giannini EH, Glass DN , Juvenile rheumatoid arthritis: Linkage to HLA demonstrated by allele sharing in affected sibpairs, Arthritis Rheum 43(10) :2335–2338, 2000. Crossref, MedlineGoogle Scholar
    • 46. Petty R, Laxer R, Lindsley C, Wedderburn L , Textbook of Pediatric Rheumatology, Elsevier Health Sciences, 2015. ISBN 978-03-23356-13-8. Google Scholar
    • 47. Belorkar A, Vadigepalli R, Wong L , SPSNet: Subpopulation-sensitive network-based analysis of heterogeneous gene expression data, BMC Syst Biol 12(2) :28, 2018. Crossref, MedlineGoogle Scholar
    • 48. Lim K, Wong L , Finding consistent disease subnetworks using pfsnet, Bioinformatics 30(2) :189–196, 2013. Crossref, MedlineGoogle Scholar
    • 49. Maleki F, Ovens K, McQuillan I, Rezaei E, Rosenberg AM, Kusalik AJ , Gene set databases: A fountain of knowledge or a siren call? Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (BCB’19), ACM, New York, NY, USA, pp. 269–278, 2019. CrossrefGoogle Scholar
    Published: 20 November 2019