This Issue


International Journal of Foundations of Computer Science: Vol. 14, No. 06
Print ISSN: 0129-0541
Online ISSN: 1793-6373

 
writing-guides   ealerts
Connect with WS
 


EDIT-DISTANCE OF WEIGHTED AUTOMATA: GENERAL DEFINITIONS AND ALGORITHMS

MEHRYAR MOHRI

AT&T Labs – Research, 180 Park Avenue, Rm E135, Florham Park, NJ 07932, USA

Received: 15 November 2002
Accepted: 14 March 2002

The problem of computing the similarity between two sequences arises in many areas such as computational biology and natural language processing. A common measure of the similarity of two strings is their edit-distance, that is the minimal cost of a series of symbol insertions, deletions, or substitutions transforming one string into the other. In several applications such as speech recognition or computational biology, the objects to compare are distributions over strings, i.e., sets of strings representing a range of alternative hypotheses with their associated weights or probabilities. We define the edit-distance of two distributions over strings and present algorithms for computing it when these distributions are given by automata. In the particular case where two sets of strings are given by unweighted automata, their edit-distance can be computed using the general algorithm of composition of weighted transducers combined with a single-source shortest-paths algorithm. In the general case, we show that general weighted automata algorithms over the appropriate semirings can be used to compute the edit-distance of two weighted automata exactly. These include classical algorithms such as the composition and ∊-removal of weighted transducers and a new and simple synchronization algorithm for weighted transducers which, combined with ∊-removal, can be used to normalize weighted transducers with bounded delays. Our algorithm for computing the edit-distance of weighted automata can be used to improve the word accuracy of automatic speech recognition systems. It can also be extended to provide an edit-distance automaton useful for re-scoring and other post-processing purposes in the context of large-vocabulary speech recognition.

Cited by (20):
, , . (2016) Approximate matching between a context-free grammar and a finite-state automaton. Information and Computation 247, 278-289. Online publication date: 1-Apr-2016. [CrossRef]
, , . 2016. Computing the Expected Edit Distance from a String to a PFA. Implementation and Application of Automata, 39-50. [CrossRef]
. 2016. Prefix Distance Between Regular Languages. Implementation and Application of Automata, 224-235. [CrossRef]
, , . (2013) Bounded repairability of word languages. Journal of Computer and System Sciences 79:8, 1302-1321. Online publication date: 1-Dec-2013. [CrossRef]
, , . (2013) THE EDIT-DISTANCE BETWEEN A REGULAR LANGUAGE AND A CONTEXT-FREE LANGUAGE. International Journal of Foundations of Computer Science 24:07, 1067-1082. Online publication date: 1-Nov-2013. [Abstract | PDF (273 KB) | PDF Plus (283 KB)]
, , . (2012) Simple and Efficient Model Filtering in Statistical Machine Translation. The Prague Bulletin of Mathematical Linguistics 98:-1. Online publication date: 1-Jan-2012. [CrossRef]
, , . (2011) GENERAL ALGORITHMS FOR TESTING THE AMBIGUITY OF FINITE AUTOMATA AND THE DOUBLE-TAPE AMBIGUITY OF FINITE-STATE TRANSDUCERS. International Journal of Foundations of Computer Science 22:04, 883-904. Online publication date: 1-Jun-2011. [Abstract | PDF (381 KB) | PDF Plus (389 KB)]
, , . (2011) Regular Repair of Specifications. 2011 IEEE 26th Annual Symposium on Logic in Computer Science, 335-344. [CrossRef]
, , . (2011) Deciding word neighborhood with universal neighborhood automata. Theoretical Computer Science 412:22, 2340-2355. Online publication date: 1-May-2011. [CrossRef]
, , , , . (2011) WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding. IEEE Transactions on Audio, Speech, and Language Processing. Online publication date: 1-Jan-2011. [CrossRef]
, . (2010) Alternative Approaches for Workflow Similarity. 2010 IEEE International Conference on Services Computing, 337-345. [CrossRef]
, , . (2010) On implementing recognizable transductions. International Journal of Computer Mathematics 87:2, 260-277. Online publication date: 1-Feb-2010. [CrossRef]
, . (2009) N-WAY COMPOSITION OF WEIGHTED FINITE-STATE TRANSDUCERS. International Journal of Foundations of Computer Science 20:04, 613-627. Online publication date: 1-Aug-2009. [Abstract | PDF (264 KB) | PDF Plus (267 KB)]
, . (2009) On the definition of stochastic λ-transducers. International Journal of Computer Mathematics 86:8, 1300-1310. Online publication date: 1-Aug-2009. [CrossRef]
. (2007) Computing the edit distance of a regular language. Information and Computation 205:9, 1307-1316. Online publication date: 1-Sep-2007. [CrossRef]
, . (2006) Piloting an Empirical Study on Measures forWorkflow Similarity. 2006 IEEE International Conference on Services Computing (SCC'06), 94-102. [CrossRef]
, , , . (2006) Metrics for Action-labelled Quantitative Transition Systems. Electronic Notes in Theoretical Computer Science 153:2, 79-96. Online publication date: 1-May-2006. [CrossRef]
. (2005) Computing the Levenshtein distance of a regular language. IEEE Information Theory Workshop, 2005., 4 pp.. [CrossRef]
, , , . (2005) Minimum exact word error training. IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., 186-190. [CrossRef]
, , , . A generalized construction of integrated speech recognition transducers. 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, I-761-4. [CrossRef]