WORD EXTRACTION ASSOCIATED WITH A CONFIDENCE INDEX FOR ONLINE HANDWRITTEN SENTENCE RECOGNITION
Abstract
This paper presents a word extraction approach based on the use of a confidence index to limit the total number of segmentation hypotheses in order to further extend our online sentence recognition system to perform "on-the-fly" recognition. Our initial word extraction task is based on the characterization of the gap between each couple of consecutive strokes from the online signal of the handwritten sentence. A confidence index is associated to the gap classification result in order to evaluate its reliability. A reconsideration process is then performed to create additional segmentation hypotheses to ensure the presence of the correct segmentation among the hypotheses. In this process, we control the total number of segmentation hypotheses to limit the complexity of the recognition process and thus the execution time. This approach is evaluated on a test set of 425 English sentences written by 17 writers, using different metrics to analyze the impact of the word extraction task on the whole sentence recognition system performances. The word extraction task using the best reconsideration strategy achieves a 97.94% word extraction rate and a 84.85% word recognition rate which represents a 33.1% word error rate decrease relatively to the initial word extraction task (with no segmentation hypothesis reconsideration).
References
E. Anquetil and G. Lorette , Perceptual model of handwriting drawing, application to the handwriting segmentation problem, Proc. 4th Int. Conf. Document Analysis and Recognition (1997) pp. 112–117. Google ScholarE. Anquetil and H. Bouchereau , Integration of an on-line handwriting recognition system in a smart phone device, Proc. 16th Int. Conf. Pattern Recognition (2002) pp. 192–195. Google Scholar-
C. Bishop , Neural Networks for Pattern Recognition ( Oxford University Press , 1995 ) . Crossref, Google Scholar J. Blanchard and T. Artières , On-line handwritten documents segmentation, Proc. 9th Int. Workshop on Frontiers in Handwriting Recognition (2004) pp. 148–153. Google ScholarF. Bouteruche , Design and evaluation of handwriting input interfaces for small-size mobile devices, Proc. 1st Workshop on Improving and Assessing Pen-Based Input Techniques (2005) pp. 49–56. Google ScholarF. Bouteruche and E. Anquetil , Fuzzy point of view combination for contextual shape recognition: Application to on-line graphic gesture recognition, Proc. 18th Int. Conf. Pattern Recognition (2006) pp. 1088–1091. Google ScholarS. Carbonnel and E. Anquetil , Lexicon organization and string edit distance learning for lexical post-processing in handwriting recognition, Proc. 9th Int. Workshop on Frontiers in Handwriting Recognition (2004) pp. 462–467. Google ScholarS. Carbonnel and E. Anquetil , Lexical post-processing optimization for handwritten word recognition, Proc. 7th Int. Conf. Document Analysis and Recognition (2003) pp. 477–481. Google ScholarM. Feldbach and K. D. Tönnies , Word segmentation of handwritten dates in historical documents by combining semantic a-priori-knowledge with local features, Proc. 7th Int. Conf. Document Analysis and Recognition (2003) pp. 333–337. Google ScholarG. D. Forney , The Viterbi Algorithm, Proc. IEEE61 (1973) pp. 268–278. Google Scholar-
W. N. Francis and H. Kucera , Brown Corpus Manual ( Brown University , 1979 ) . Google Scholar A. K. Jain , A. M. Namboodiri and J. Subrahmonia , Structure in on-line documents, Proc. 6th Int. Conf. Document Analysis and Recognition (2001) pp. 844–848. Google Scholar- Int. J. Patt. Recogn. Artif. Intell. 17(4), 617 (2003), DOI: 10.1142/S0218001403002538. Link, Web of Science, Google Scholar
- Int. J. Docum. Anal. Recogn. 2(1), 37 (1999), DOI: 10.1007/s100320050035. Crossref, Google Scholar
S. H. Kim , Word segmentation of printed text lines based on gap clustering and special symbol detection, Proc. 16th Int. Conf. Pattern Recognition (2002) pp. 320–323. Google Scholar- IEEE Trans. Fuzzy Syst. 4(3), 385 (1996), DOI: 10.1109/91.531779. Crossref, Web of Science, Google Scholar
- Int. J. Patt. Recogn. Artif. Intell. 21(1), 83 (2007), DOI: 10.1142/S0218001407005314. Link, Web of Science, Google Scholar
M. Liwicki , M. Scherz and H. Bunke , Word extraction from on-line handwritten text lines, Proc. 18th Int. Conf. Pattern Recognition (2006) pp. 929–933. Google ScholarF. Lüthy , T. Varga and H. Bunke , Using hidden Markov models as a tool for handwritten text line segmentation, Proc. 9th Int. Conf. Document Analysis and Recognition (2007) pp. 8–12. Google ScholarU. Mahadevan and R. C. Nagabushnam , Gap metrics for word separation in handwritten lines, Proc. 3rd Int. Conf. Document Analysis and Recognition (1995) pp. 124–127. Google ScholarU. Mahadevan and S. N. Srihari , Hypotheses generation for word separation in handwritten lines, Proc. 5th Int. Workshop Frontiers in Handwriting Recognition (Essex, 1996) pp. 453–456. Google Scholar- IEEE Trans. Patt. Anal. Mach. Intell. 27(8), 1212 (2005), DOI: 10.1109/TPAMI.2005.150. Crossref, Web of Science, Google Scholar
U.-V. Marti and H. Bunke , Text line segmentation and word recognition in a system for general writer independent handwriting recognition, Proc. 6th Int. Conf. Document Analysis and Recognition (2001) pp. 159–163. Google ScholarH. Mouchère and E. Anquetil , Proc. 18th Int. Conf. Pattern Recognition (2006) pp. 792–795. Google Scholar- IEEE Trans. Patt. Anal. Mach. Intell. 26(1), 124 (2004), DOI: 10.1109/TPAMI.2004.1261096. Crossref, Web of Science, Google Scholar
- IEEE Sign. Process. 16(5), 64 (1999), DOI: 10.1109/79.790984. Crossref, Web of Science, Google Scholar
L. Oudot , L. Prevost and M. Milgram , An activation-verification model for on-line texts recognition, Proc. 9th Int. Workshop on Frontiers in Handwriting Recognition (2004) pp. 485–490. Google Scholar- Patt. Recogn. 35(1), 245 (2002), DOI: 10.1016/S0031-3203(00)00176-X. Crossref, Web of Science, Google Scholar
- IEEE Trans. Patt. Anal. Mach. Intell. 22(1), 63 (2000), DOI: 10.1109/34.824821. Crossref, Web of Science, Google Scholar
S. Quiniou and E. Anquetil , A priori and a posteriori integration and combination of language models in an on-line handwritten sentence recognition system, Proc. 10th Int. Workshop on Frontiers in Handwriting Recognition (2006) pp. 403–408. Google ScholarE. H. Ratzlaff , Inter-line distance estimation and text line extraction for unconstrained online handwriting, Proc. 7th Int. Workshop on Frontiers in Handwriting Recognition (2000) pp. 33–42. Google Scholar- Patt. Recogn. 27(1), 41 (1994), DOI: 10.1016/0031-3203(94)90016-7. Crossref, Web of Science, Google Scholar
M. Shilman , Discerning structure from freeform handwritten notes, Proc. 7th Int. Conf. Document Analysis and Recognition (2003) pp. 60–65. Google ScholarA. Stolcke , SRILM — An extensible language modeling toolkit, Proc. 7th Int. Conf. Spoken Language Processing (2002) pp. 901–904. Google ScholarT. Varga and H. Bunke , Tree structure for word extraction, Proc. 8th Int. Conf. Document Analysis and Recognition (2005) pp. 352–356. Google ScholarM. Ye , Learning to group text lines and regions in freeform handwritten notes, Proc. 9th Int. Conf. Document Analysis and Recognition (2005) pp. 352–356. Google Scholar- Patt. Recogn. 29(10), 1599 (1996), DOI: 10.1016/0031-3203(96)00020-9. Crossref, Web of Science, Google Scholar