OpenMP Implementation of Parallel Longest Common Subsequence Algorithm for Mathematical Expression Retrieval
Abstract
Given a mathematical expression in LaTeX or MathML format, retrieval algorithm extracts similar expressions from a database. In our previous work, we have used Longest Common Subsequence (LCS) algorithm to match two expressions of lengths, and , which takes time complexity. If there are database expressions, total complexity is , and an increase in also increases this complexity. In the present work, we propose to use parallel LCS algorithm in our retrieval process. Parallel LCS has time complexity with processors and total complexity can be reduced to . For our experimentation, OpenMP based implementation has been used on Intel processor with 4 cores. However, for smaller expressions, parallel version takes more time as the implementation overhead dominates the algorithmic improvement. As such, we have proposed to use parallel version, selectively, only on larger expressions, in our retrieval algorithm to achieve better performance. We have compared the sequential and parallel versions of our ME retrieval algorithm, and the performance results have been reported on a database of 829 mathematical expressions.
Communicated by Xueyan Tang
References
- 1. ,
A structure based approach for mathematical expression retrieval , in Multi-disciplinary Trends in Artificial Intelligence (Springer Berlin Heidelberg, 2012), pp. 23–34. Crossref, Google Scholar - 2. , Introduction to Algorithms (The MIT Press and McGraw-Hill Book Company, 1989). Google Scholar
- 3. , An efficient parallel algorithm for longest common subsequence problem on GPUs, Proceedings of the World Congress on Engineering,
2010 , pp. 499–504. Google Scholar - 4. , Parallel Programming in C with MPI and OpenMP (McGraw-Hill Education Group, 2003). Google Scholar
- 5. , An Introduction to Parallel Programming (Morgan Kaufmann, 2011). Google Scholar
- 6. , Recognition and retrieval of mathematical expressions, International Journal on Document Analysis and Recognition (IJDAR) 15 (2011) 331–357. Crossref, Google Scholar
- 7. , Tangent-CFT: An embedding model for mathematical formulas, Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval,
2019 , pp. 11–18. Google Scholar - 8. , Math information retrieval: User requirements and prototype implementation, in Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’08) (ACM, New York, NY, USA, 2008), pp. 187–196. Crossref, Google Scholar
- 9. , MathFind: A math-aware search engine, Proceedings of the International Conference on Information Retrieval,
2006 , pp. 735–735. Google Scholar - 10. , Math go! prototype of a content based mathematical formula search engine, Journal of Theoretical and Applied Information Technology 4(10) (2008) 1002–1012. Google Scholar
- 11. Springer, LaTeX search, http://www.latexsearch.com/ (2019). Google Scholar
- 12. Lucene, Indexing and retrieval library, http://lucene.apache.org (2020). Google Scholar
- 13. ,
An approach to mathematical search through query formulation and data normalization , in Towards Mechanized Mathematical Assistants,LNAI , Vol. 4573 (Springer Berlin/Heidelberg, 2007), pp. 342–355. Crossref, Google Scholar - 14. , Efficient cluster-based information retrieval from mathematical markup documents, World Applied Sciences Journal 17 (2012) 611–616. Google Scholar
- 15. ,
A search engine for mathematical formulae , in Proceedings of Artificial Intelligence and Symbolic Computation,LNAI , Vol. 4120 (Springer, 2006), pp. 241–253. Crossref, Google Scholar - 16. , Substitution tree indexing, Sixth International Conference on Rewriting Techniques and Applications, RTA-95,
LNCS , Vol. 914 (1995), pp. 117–131. Crossref, Google Scholar - 17. , Improving mathematics retrieval, Proceedings of Digital Mathematics Libraries,
Grand Bend ,2009 , pp. 37–48. Google Scholar - 18. , An approach to similarity search for mathematical expressions using MathML, Towards a Digital Mathematics Library,
Grand Bend ,2009 , pp. 27–35. Google Scholar - 19. , Keyword and image-based retrieval of mathematical expressions, in Document Recognition and Retrieval XVIII, 7874 (SPIE, 2011), pp. 1–10. Crossref, Google Scholar
- 20. , Math spotting: Retrieving math in technical documents using handwritten query images, 2011 International Conference on Document Analysis and Recognition,
2011 , pp. 446–451. Google Scholar - 21. , Performance evaluation in content-based image retrieval: Overview and proposals, Pattern Recognition Letters 22(5) (2001) 593–601. Crossref, ISI, Google Scholar
- 22. , Math search for the masses: Multimodal search interfaces and appearance-based retrieval, Conferences on Intelligent Computer Mathematics,
2015 , pp. 18–36. Google Scholar - 23. , Multi-stage math formula search: Using appearance-based similarity metrics at scale, Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval,
2016 , pp. 145–154. Google Scholar - 24. , Math expression retrieval using an inverted index over symbol pairs, Document Recognition and Retrieval XXII, 9402 (2015) 940207. Google Scholar
- 25. , Characterizing searches for mathematical concepts, in 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (IEEE, 2019), pp. 57–66. Crossref, Google Scholar
- 26. , Mathematical document categorization with structure of mathematical expressions, in 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (IEEE, 2017), pp. 1–10. Crossref, Google Scholar
- 27. , Structural similarity search for formulas using leaf-root paths in operator subtrees, Proceedings of the European Conference on Information Retrieval (ECIR),
January 2019 . Google Scholar - 28. , NTCIR-12 MathIR task overview, 12th NTCIR Conference on Evaluation of Information Access Technologies,
Japan ,2016 , pp. 299–308. Google Scholar - 29. , Extending OPENMP for task parallelism, Parallel Processing Letters 13(3) (2003) 341–352. Link, Google Scholar
- 30. , Multithreaded parallelism with OPENMP, Parallel Processing Letters 15(4) (2005) 367–378. Link, Google Scholar
- 31. , Nested OPENMP parallelization of a hierarchical data clustering algorithm, Parallel Processing Letters 20(2) (2010) 187–208. Link, Google Scholar
- 32. , Scaling support vector machines on modern HPC platforms, Journal of Parallel and Distributed Computing 76 (2015) 16–31. Crossref, ISI, Google Scholar
- 33. , Strategies for workload balancing in cluster-based image databases, Parallel Processing Letters 14(1) (2004) 33–44. Link, Google Scholar
- 34. , OpenMP parallelization strategies for a discontinuous galerkin solver, International Journal of Parallel Programming 47(5-6) (2019) 838–873. Crossref, ISI, Google Scholar
- 35. , OpenMP implementation of SPICE3 circuit simulator, International Journal of Parallel Programming 35(5) (2007) 493–505. Crossref, ISI, Google Scholar
- 36. , A study of integer sorting on multicores, Parallel Processing Letters 28(4) (2018) 1850014. Link, ISI, Google Scholar
- 37. , An OpenMP-based tool for finding longest common subsequence in bioinformatics, BMC Research Notes 12 (2019). Crossref, Google Scholar
- 38. , Parallel algorithms for the longest common subsequence problem, IEEE Transactions on Parallel and Distributed Systems 5(8) (1994) 835–848. Crossref, ISI, Google Scholar
- 39. , Time-efficient parallel algorithms for the longest common subsequence and related problems, Journal of Parallel and Distributed Computing 57(2) (1999) 212–223. Crossref, ISI, Google Scholar
- 40. ,
Fast parallel algorithms for the longest common subsequence problem using an optical bus , in Computational Science and Its Applications – ICCSA 2005 (Springer Berlin Heidelberg, 2005), pp. 338–348. Crossref, Google Scholar - 41. , Efficient CGM-based parallel algorithms for the longest common subsequence problem with multiple substring-exclusion constraints, Parallel Computing 91 (2020) 102598. Crossref, ISI, Google Scholar
- 42. , High level parallel skeletons for dynamic programming, Parallel Processing Letters 18(1) (2008) 133–147. Link, Google Scholar
- 43. P. Pavan Kumar, PACME – Mathematical Expression Image Database and its LaTeX Ground Truth, https://sites.google.com/site/ppkumarcs/datasets (2018). Google Scholar
- 44. , Isolated structural error analysis of printed mathematical expressions, Pattern Analysis and Applications 21(4) (2018) 1097–1107. Crossref, ISI, Google Scholar
- 45. , Parallel processing of biological sequence comparison algorithms, International Journal of Parallel Programming 17(3) (1988) 259–275. Crossref, ISI, Google Scholar
- 46. , Parallel programming paradigms and frameworks in big data era, International Journal of Parallel Programming 42(5) (2014) 710–738. Crossref, ISI, Google Scholar


