World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Mon, Jun 21st, 2021 at 1am (EDT)

During this period, the E-commerce and registration of new users may not be available for up to 6 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

SENTENCE ALIGNMENT USING FEED FORWARD NEURAL NETWORK

    Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English–Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.

    References

    • P.   Brown , J.   Lai and R.   Mercer , Aligning sentences in parallel corpora , Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics ( Berkeley, CA, USA , 1991 ) . Google Scholar
    • A. Chen and F. Gey, TREC  (2001). Google Scholar
    • K. H. Chen and H. H. Chen, A part-of-speech-based alignment algorithm, Proceedings of 15th International Conference on Computational Linguistics (1994) pp. 166–171. Google Scholar
    • S. F. Chen, Aligning sentences in bilingual corpora using lexical information, Proceedings of ACL-93 (1993) pp. 9–16. Google Scholar
    • C. C. Yang and K. W. Li, Information Processing and Management 40, 939 (2004). Crossref, ISIGoogle Scholar
    • N. Collier, K. Ono and H. Hirakawa, COLING-ACL 268 (1998). Google Scholar
    • P. Danielsson and K. Mühlenbock, AMTA 158 (2000). Google Scholar
    • H. Déjean, É. Gaussier and F. Sadat, Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora, Proceedings of the 19th International Conference on Computational Linguistics COLING (2002) pp. 218–224. Google Scholar
    • W. B. Dolan, J. Pinkham and S. D. Richardson, AMTA 237 (2002). Google Scholar
    • M. Fattah, F. Ren and S. Kuroiwa, The international Journal "Information" 8(1), 165 (2005). ISIGoogle Scholar
    • W. A. Gale and K. W. Church, Computational Linguistics 19, 75 (1993). Google Scholar
    • F. C. Geyet al., SIGIR 455 (2002). Google Scholar
    • S. J. Ker and J. S. Chang, Computational Linguistics 23(2), 313 (1997). ISIGoogle Scholar
    • M. Davis and F. Ren, Automatic Japanese-Chinese parallel text alignment, Proceedings of International Conference on Chinese Information Processing (1998) pp. 452–457. Google Scholar
    • I. D.   Melamed , A portable algorithm for mapping bitext correspondence , In The 35th Conference of the Association for Computational Linguistics (ACL 1997) ( 1997 ) . Google Scholar
    • I. D. Melamed, Computational Linguistics 25(1), 107 (1999). ISIGoogle Scholar
    • R. C. Moore, AMTA 135 (2002). Google Scholar
    • D. W.   Oard , Alternative approaches for cross-language text retrieval , AAAI Symposium in Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence , eds. D.   Hull and D.   Oard ( 1997 ) . Google Scholar
    • A. Ribeiroet al., Cognates alignment, Proceedings of the Machine Translation Summit VIII (MT Summit VIII) — Machine Translation in the Information Age, ed. Bente Maegaard (2001) pp. 287–292. Google Scholar
    • M.   Simard , Text-translation alignment: Three languages are better than two , Proceedings of EMNLP/VLC- 99 ( 1999 ) . Google Scholar
    • M. Simard, G. Foster and P. Isabelle, Using cognates to align sentences in bilingual corpora, Proceedings of TMI92 (1992) pp. 67–81. Google Scholar
    • C. Thomas and C. Kevin, Computational Linguistics and Chinese Language Processing 10(1), 95 (2005). ISIGoogle Scholar
    • K. W. Li and C. C. Yang, Journal of the American Society for Information Science and Technology 56(3), 272 (2005). Crossref, ISIGoogle Scholar
    • M. Fattah, F. Ren and S. Kuroiwa, Information Processing and Management 42(4), 1003 (2006). Crossref, ISIGoogle Scholar
    • C. C.   Yang and K. W.   Li , Cross-lingual semantics for crime analysis using associate constraint network , Proceedings for the Second NSF/NIJ Symposium on Intelligence and Security Informatics (ISI2004) ( 2004 ) . Google Scholar
    Remember to check out the Most Cited Articles!

    Check out our titles in neural networks today!