World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

A Statistical Model for Automatic Extraction of Korean Transliterated Foreign Words

    https://doi.org/10.1142/S021942790300084XCited by:3 (Source: Crossref)

    In this paper, we will describe a Korean transliterated foreign word extraction algorithm. In the proposed method, we reformulate the foreign word extraction problem as a syllable-tagging problem such that each syllable is tagged with a foreign syllable tag or a pure Korean syllable tag. Syllable sequences of Korean strings are modelled by Hidden Markov Model whose state represents a character with binary marking to indicate whether the syllable is part of a transliterated foreign word or not. The proposed method extracts a transliterated foreign word with high recall rate and precision rate. Moreover, our method shows good performance even with small-sized training corpora.

    References

    •  Berger, V. Della Pietra and S. Della Pietra, Computational Linguistics 22(1), 39 (1995). Google Scholar
    • X. D.   Huang , Y.   Ariki and M. A.   Jack , Hidden Markov Models for Speech Recognition ( Edinburgh University Press , 1990 ) . Google Scholar
    • A.   James , Natural Language Understanding ( Benjamin/Cummings , 1995 ) . Google Scholar
    • B. J. Kang, J. S. Lee and K. S. Choi, Journal of Korea Information Science Society 26(10), 1237 (1999). Google Scholar
    • B. J. Kang and K. S. Choi, Two Approaches for the Resolution of Word Mismatch Problem Caused by English Word and Various Korean Transliterations in Korean Information Retrieval, Proceedings of the International Workshop on Information Retrieval with Asian Languages (IRAL2000) (2000) pp. 133–140. Google Scholar
    • B. J. Kang and K. S. Choi, Journal of Information Processing and Management 38(1), 91 (2002). CrossrefGoogle Scholar
    • S. S. Kang, H. I. Kwon and D. R. Kim, The Role of Morphological Analyses for Korean Automatic Indexing, Proceedings of the 22nd KISS Spring Conference (1995) pp. 929–932. Google Scholar
    • Y. H.   Kwon , K. S.   Jeong and S. H.   Myaeng , Foreign Word Identification Using Statistical Method for Information Retrieval , Proceedings of the 17th International Conference of Computer Processing of Oriental Languages ( 1997 ) . Google Scholar
    • J. H. Leeet al., Journal of Korea Society for Information Management 12(2), (1995). Google Scholar
    • J. S. Lee and K. S. Choi, Int. J. Computer Processing of Oriental Languages 12(1), 17 (1998). Google Scholar
    • J. S. Lee, An English-Korean Transliteration and Re-transliteration Model for Cross-lingual Information Retrieval. Ph.D. dissertation, Department of Computer Science, Korea Advanced Institute of Science and Technology, 1999 (in Korean) . Google Scholar
    • J. S. Lee, Automatic Construction of a Transliteration Dictionary from Bilingual Corpus, Proceedings of the 11th Conference on Hangul and Korean Language Information Processing (1999) pp. 142–149. Google Scholar
    • K. H. Lee, Study on Named Entity Recognition in Korean Text, M. S. thesis, Department of Computer Science, Korea Advanced Institute of Science and Technology, 2000 (in Korean) . Google Scholar
    • S. H.   Myaeng and D. H.   Jang , On Language-Dependency in Indexing , Proceedings of the Workshop on Information Retrieval with Oriental Languages ( 1996 ) . Google Scholar
    • S. H. Myaeng, K. S. Jeong and Y. H. Kwon, The Effect of a Proper Handling of Foreign and English Words in Retrieving Korean Test, Proceedings of the 2nd International Workshop on Information Retrieval with Asian Languages (1997) pp. 114–122. Google Scholar
    • J. H. Oh and K. S. Choi, Automatic Extraction of Transliterated Foreign Words in the Domain Specific Text, Proceedings of the 11th Conference on Hangul and Korean Language Information Processing (1999) pp. 137–141. Google Scholar
    • J. H. Ohet al., Journal of Terminology 6(2), 287 (2000). CrossrefGoogle Scholar
    • J. H. Oh and K. S. Choi, Automatic Extraction of Transliterated Foreign Words Using HMM, Proceedings of the 19th International Conference on Computer Processing of Oriental Languages (ICCPOL 2001) (2001) pp. 433–438. Google Scholar
    • Y. C.   Park et al. , Development of the KT Test Collection for Researchers in Information Retrieval , Proceedings of the 23rd KISS Spring Conference ( 1996 ) . Google Scholar
    • L. Rabiner, Proceedings of the IEEE 77(2), (1989), DOI: 10.1109/5.18626. Google Scholar
    • G.   Salton and M. J.   McGill , Introduction to Modern Information Retrieval ( McGraw-Hill , 1983 ) . Google Scholar
    • K. Tsuji, Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs from Bilingual Corpora, Proceedings of the 19th International Conference on Computer Processing of Oriental Languages (ICCPOL 2001) (2001) pp. 245–250. Google Scholar
    • Do-Gil   Lee et al. , Automatic Word Spacing Using Hidden Markov Model for Refining Korean Text Corpora , Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization ( 2002 ) . Google Scholar