This Issue

writing-guides   ealerts
Connect with WS
 


IncMD: Incremental trie-based structural motif discovery algorithm

Ghada Badr

College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia

IRI - The City of Scientific Research and Technological Applications, University and Research District, P. O. 21934, New Borg Alarab, Alexandria, Egypt

Isra Al-Turaiki

College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia

Marcel Turcotte

Faculty of Engineering, School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada

Hassan Mathkour

College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia

Received: 1 October 2013
Revised: 1 June 2014
Accepted: 5 September 2014
Published: 31 October 2014

The discovery of common RNA secondary structure motifs is an important problem in bioinformatics. The presence of such motifs is usually associated with key biological functions. However, the identification of structural motifs is far from easy. Unlike motifs in sequences, which have conserved bases, structural motifs have common structure arrangements even if the underlying sequences are different. Over the past few years, hundreds of algorithms have been published for the discovery of sequential motifs, while less work has been done for the structural motifs case. Current structural motif discovery algorithms are limited in terms of accuracy and scalability. In this paper, we present an incremental and scalable algorithm for discovering RNA secondary structure motifs, namely IncMD. We consider the structural motif discovery as a frequent pattern mining problem and tackle it using a modified a priori algorithm. IncMD uses data structures, trie-based linked lists of prefixes (LLP), to accelerate the search and retrieval of patterns, support counting, and candidate generation. We modify the candidate generation step in order to adapt it to the RNA secondary structure representation. IncMD constructs the frequent patterns incrementally from RNA secondary structure basic elements, using nesting and joining operations. The notion of a motif group is introduced in order to simulate an alignment of motifs that only differ in the number of unpaired bases. In addition, we use a cluster beam approach to select motifs that will survive to the next iterations of the search. Results indicate that IncMD can perform better than some of the available structural motif discovery algorithms in terms of sensitivity (Sn), positive predictive value (PPV), and specificity (Sp). The empirical results also show that the algorithm is scalable and runs faster than all of the compared algorithms.

Keywords: Motif discovery; RNA secondary structure; data mining; A priori; Trie

Remember to check out the Most Cited Articles in JBCB !

Bioinformatics/ Biocomputing books
By authors from Stanford, ETH Zurich, NYU