SENTENCE ALIGNMENT USING FEED FORWARD NEURAL NETWORK
Abstract
Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English–Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
References
-
P. Brown , J. Lai and R. Mercer , Aligning sentences in parallel corpora , Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics ( Berkeley, CA, USA , 1991 ) . Google Scholar - TREC (2001). Google Scholar
K. H. Chen and H. H. Chen , A part-of-speech-based alignment algorithm, Proceedings of 15th International Conference on Computational Linguistics (1994) pp. 166–171. Google ScholarS. F. Chen , Aligning sentences in bilingual corpora using lexical information, Proceedings of ACL-93 (1993) pp. 9–16. Google Scholar- Information Processing and Management 40, 939 (2004). Crossref, ISI, Google Scholar
- COLING-ACL 268 (1998). Google Scholar
- AMTA 158 (2000). Google Scholar
H. Déjean , É. Gaussier and F. Sadat , Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora, Proceedings of the 19th International Conference on Computational Linguistics COLING (2002) pp. 218–224. Google Scholar- AMTA 237 (2002). Google Scholar
- The international Journal "Information" 8(1), 165 (2005). ISI, Google Scholar
- Computational Linguistics 19, 75 (1993). Google Scholar
- SIGIR 455 (2002). Google Scholar
- Computational Linguistics 23(2), 313 (1997). ISI, Google Scholar
M. Davis and F. Ren , Automatic Japanese-Chinese parallel text alignment, Proceedings of International Conference on Chinese Information Processing (1998) pp. 452–457. Google Scholar-
I. D. Melamed , A portable algorithm for mapping bitext correspondence , In The 35th Conference of the Association for Computational Linguistics (ACL 1997) ( 1997 ) . Google Scholar - Computational Linguistics 25(1), 107 (1999). ISI, Google Scholar
- AMTA 135 (2002). Google Scholar
-
D. W. Oard , Alternative approaches for cross-language text retrieval , AAAI Symposium in Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence , eds.D. Hull and D. Oard ( 1997 ) . Google Scholar A. Ribeiro , Cognates alignment, Proceedings of the Machine Translation Summit VIII (MT Summit VIII) — Machine Translation in the Information Age, ed.Bente Maegaard (2001) pp. 287–292. Google Scholar-
M. Simard , Text-translation alignment: Three languages are better than two , Proceedings of EMNLP/VLC- 99 ( 1999 ) . Google Scholar M. Simard , G. Foster and P. Isabelle , Using cognates to align sentences in bilingual corpora, Proceedings of TMI92 (1992) pp. 67–81. Google Scholar- Computational Linguistics and Chinese Language Processing 10(1), 95 (2005). ISI, Google Scholar
- Journal of the American Society for Information Science and Technology 56(3), 272 (2005). Crossref, ISI, Google Scholar
- Information Processing and Management 42(4), 1003 (2006). Crossref, ISI, Google Scholar
-
C. C. Yang and K. W. Li , Cross-lingual semantics for crime analysis using associate constraint network , Proceedings for the Second NSF/NIJ Symposium on Intelligence and Security Informatics (ISI2004) ( 2004 ) . Google Scholar
| Remember to check out the Most Cited Articles! |
|---|
|
Check out our titles in neural networks today! |


