World Scientific
  • Search
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Tue, Oct 25th, 2022 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

OpenMP Implementation of Parallel Longest Common Subsequence Algorithm for Mathematical Expression Retrieval

    Given a mathematical expression in LaTeX or MathML format, retrieval algorithm extracts similar expressions from a database. In our previous work, we have used Longest Common Subsequence (LCS) algorithm to match two expressions of lengths, m and n, which takes O(mn) time complexity. If there are T database expressions, total complexity is O(Tmn), and an increase in T also increases this complexity. In the present work, we propose to use parallel LCS algorithm in our retrieval process. Parallel LCS has O(max(m,n)) time complexity with max(m,n) processors and total complexity can be reduced to O(Tmax(m,n)). For our experimentation, OpenMP based implementation has been used on Intel i3 processor with 4 cores. However, for smaller expressions, parallel version takes more time as the implementation overhead dominates the algorithmic improvement. As such, we have proposed to use parallel version, selectively, only on larger expressions, in our retrieval algorithm to achieve better performance. We have compared the sequential and parallel versions of our ME retrieval algorithm, and the performance results have been reported on a database of 829 mathematical expressions.

    Communicated by Xueyan Tang

    References

    • 1. P. Pavan Kumar, A. Agarwal and C. Bhagvati, A structure based approach for mathematical expression retrieval, in Multi-disciplinary Trends in Artificial Intelligence (Springer Berlin Heidelberg, 2012), pp. 23–34. CrossrefGoogle Scholar
    • 2. T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to Algorithms (The MIT Press and McGraw-Hill Book Company, 1989). Google Scholar
    • 3. J. Yang, Y. Xu and Y. Shang, An efficient parallel algorithm for longest common subsequence problem on GPUs, Proceedings of the World Congress on Engineering, 2010, pp. 499–504. Google Scholar
    • 4. M. J. Quinn, Parallel Programming in C with MPI and OpenMP (McGraw-Hill Education Group, 2003). Google Scholar
    • 5. P. S. Pacheco, An Introduction to Parallel Programming (Morgan Kaufmann, 2011). Google Scholar
    • 6. R. Zanibbi and D. Blostein, Recognition and retrieval of mathematical expressions, International Journal on Document Analysis and Recognition (IJDAR) 15 (2011) 331–357. CrossrefGoogle Scholar
    • 7. B. Mansouri, S. Rohatgi, D. W. Oard, J. Wu, C. L. Giles and R. Zanibbi, Tangent-CFT: An embedding model for mathematical formulas, Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, 2019, pp. 11–18. Google Scholar
    • 8. J. Zhao, M.-Y. Kan and Y. L. Theng, Math information retrieval: User requirements and prototype implementation, in Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’08) (ACM, New York, NY, USA, 2008), pp. 187–196. CrossrefGoogle Scholar
    • 9. R. Miner and R. Munavalli, MathFind: A math-aware search engine, Proceedings of the International Conference on Information Retrieval, 2006, pp. 735–735. Google Scholar
    • 10. M. Adeel, H. S. Cheung and A. H. Khiyal, Math go! prototype of a content based mathematical formula search engine, Journal of Theoretical and Applied Information Technology 4(10) (2008) 1002–1012. Google Scholar
    • 11. Springer, LaTeX search, http://www.latexsearch.com/ (2019). Google Scholar
    • 12. Lucene, Indexing and retrieval library, http://lucene.apache.org (2020). Google Scholar
    • 13. R. Miner and R. Munavalli, An approach to mathematical search through query formulation and data normalization, in Towards Mechanized Mathematical Assistants, LNAI, Vol. 4573 (Springer Berlin/Heidelberg, 2007), pp. 342–355. CrossrefGoogle Scholar
    • 14. M. Adeel, M. Sher and M. S. H. Khiyal, Efficient cluster-based information retrieval from mathematical markup documents, World Applied Sciences Journal 17 (2012) 611–616. Google Scholar
    • 15. M. Kohlhase and I. A. Sucan, A search engine for mathematical formulae, in Proceedings of Artificial Intelligence and Symbolic Computation, LNAI, Vol. 4120 (Springer, 2006), pp. 241–253. CrossrefGoogle Scholar
    • 16. P. Graf, Substitution tree indexing, Sixth International Conference on Rewriting Techniques and Applications, RTA-95, LNCS, Vol. 914 (1995), pp. 117–131. CrossrefGoogle Scholar
    • 17. S. Kamali and F. W. Tompa, Improving mathematics retrieval, Proceedings of Digital Mathematics Libraries, Grand Bend, 2009, pp. 37–48. Google Scholar
    • 18. K. Yokoi and A. Aizawa, An approach to similarity search for mathematical expressions using MathML, Towards a Digital Mathematics Library, Grand Bend, 2009, pp. 27–35. Google Scholar
    • 19. R. Zanibbi and B. Yuan, Keyword and image-based retrieval of mathematical expressions, in Document Recognition and Retrieval XVIII, 7874 (SPIE, 2011), pp. 1–10. CrossrefGoogle Scholar
    • 20. R. Zanibbi and L. Yu, Math spotting: Retrieving math in technical documents using handwritten query images, 2011 International Conference on Document Analysis and Recognition, 2011, pp. 446–451. Google Scholar
    • 21. H. Müller, W. Müller, D. M. Squire, S. Marchand-Maillet and T. Pun, Performance evaluation in content-based image retrieval: Overview and proposals, Pattern Recognition Letters 22(5) (2001) 593–601. Crossref, ISIGoogle Scholar
    • 22. R. Zanibbi and A. Orakwue, Math search for the masses: Multimodal search interfaces and appearance-based retrieval, Conferences on Intelligent Computer Mathematics, 2015, pp. 18–36. Google Scholar
    • 23. R. Zanibbi, K. Davila, A. Kane and F. W. Tompa, Multi-stage math formula search: Using appearance-based similarity metrics at scale, Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016, pp. 145–154. Google Scholar
    • 24. D. Stalnaker and R. Zanibbi, Math expression retrieval using an inverted index over symbol pairs, Document Recognition and Retrieval XXII, 9402 (2015) 940207. Google Scholar
    • 25. B. Mansouri, R. Zanibbi and D. W. Oard, Characterizing searches for mathematical concepts, in 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (IEEE, 2019), pp. 57–66. CrossrefGoogle Scholar
    • 26. T. Suzuki and A. Fujii, Mathematical document categorization with structure of mathematical expressions, in 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (IEEE, 2017), pp. 1–10. CrossrefGoogle Scholar
    • 27. W. Zhong and R. Zanibbi, Structural similarity search for formulas using leaf-root paths in operator subtrees, Proceedings of the European Conference on Information Retrieval (ECIR), January 2019. Google Scholar
    • 28. R. Zanibbi, A. Aizawa, M. Kohlhase, I. Ounis, G. Topić and K. Davila, NTCIR-12 MathIR task overview, 12th NTCIR Conference on Evaluation of Information Access Technologies, Japan, 2016, pp. 299–308. Google Scholar
    • 29. A. Marowka, Extending OPENMP for task parallelism, Parallel Processing Letters 13(3) (2003) 341–352. LinkGoogle Scholar
    • 30. R. Rufai, M. Bozyigit, J. Alghamdi and M. Ahmed, Multithreaded parallelism with OPENMP, Parallel Processing Letters 15(4) (2005) 367–378. LinkGoogle Scholar
    • 31. P. E. Hadjidoukas and L. Amsaleg, Nested OPENMP parallelization of a hierarchical data clustering algorithm, Parallel Processing Letters 20(2) (2010) 187–208. LinkGoogle Scholar
    • 32. Y. You, H. Fu, S. L. Song, A. Randles, D. Kerbyson, A. Marquez, G. Yang and A. Hoisie, Scaling support vector machines on modern HPC platforms, Journal of Parallel and Distributed Computing 76 (2015) 16–31. Crossref, ISIGoogle Scholar
    • 33. F. Drews, K. Ecker, O. Kao and S. Schomann, Strategies for workload balancing in cluster-based image databases, Parallel Processing Letters 14(1) (2004) 33–44. LinkGoogle Scholar
    • 34. A. Crivellini, M. Franciolini, A. Colombo and F. Bassi, OpenMP parallelization strategies for a discontinuous galerkin solver, International Journal of Parallel Programming 47(5-6) (2019) 838–873. Crossref, ISIGoogle Scholar
    • 35. T. Weng, R. Perng and B. M. Chapman, OpenMP implementation of SPICE3 circuit simulator, International Journal of Parallel Programming 35(5) (2007) 493–505. Crossref, ISIGoogle Scholar
    • 36. A. V. Gerbessiotis, A study of integer sorting on multicores, Parallel Processing Letters 28(4) (2018) 1850014. Link, ISIGoogle Scholar
    • 37. R. Shikder, P. Thulasiraman, P. Irani and P. Hu, An OpenMP-based tool for finding longest common subsequence in bioinformatics, BMC Research Notes 12 (2019). CrossrefGoogle Scholar
    • 38. M. Lu and H. Lin, Parallel algorithms for the longest common subsequence problem, IEEE Transactions on Parallel and Distributed Systems 5(8) (1994) 835–848. Crossref, ISIGoogle Scholar
    • 39. J.-F. Myoupo and D. Semé, Time-efficient parallel algorithms for the longest common subsequence and related problems, Journal of Parallel and Distributed Computing 57(2) (1999) 212–223. Crossref, ISIGoogle Scholar
    • 40. X. Xu, L. Chen, Y. Pan and P. He, Fast parallel algorithms for the longest common subsequence problem using an optical bus, in Computational Science and Its Applications – ICCSA 2005 (Springer Berlin Heidelberg, 2005), pp. 338–348. CrossrefGoogle Scholar
    • 41. V. K. Tchendji, A. N. Ngomade, J. L. Zeutouo and J. F. Myoupo, Efficient CGM-based parallel algorithms for the longest common subsequence problem with multiple substring-exclusion constraints, Parallel Computing 91 (2020) 102598. Crossref, ISIGoogle Scholar
    • 42. I. Peláez, F. Almeida and D. González, High level parallel skeletons for dynamic programming, Parallel Processing Letters 18(1) (2008) 133–147. LinkGoogle Scholar
    • 43. P. Pavan Kumar, PACME – Mathematical Expression Image Database and its LaTeX Ground Truth, https://sites.google.com/site/ppkumarcs/datasets (2018). Google Scholar
    • 44. P. Pavan Kumar, A. Agarwal and C. Bhagvati, Isolated structural error analysis of printed mathematical expressions, Pattern Analysis and Applications 21(4) (2018) 1097–1107. Crossref, ISIGoogle Scholar
    • 45. E. W. Edmiston, N. G. Core, J. H. Saltz and R. M. Smith, Parallel processing of biological sequence comparison algorithms, International Journal of Parallel Programming 17(3) (1988) 259–275. Crossref, ISIGoogle Scholar
    • 46. C. Dobre and F. Xhafa, Parallel programming paradigms and frameworks in big data era, International Journal of Parallel Programming 42(5) (2014) 710–738. Crossref, ISIGoogle Scholar