Stochastic Contextual Edit Distance and Probabilistic FSTs

Ryan Cotterell, Nanyun Peng, Jason Eisner

June 2014

Abstract

String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) defined stochastic edit distance—a probability distribution p(y | x) whose parameters can be trained from data. We generalize this so that the probability of choosing each edit operation can depend on contextual features. We show how to construct and train a probabilistic finite-state transducer that computes our stochastic ontextual edit distance. To illustrate the improvement from conditioning on context, we model typos found in social media text.

Type

Conference paper

Publication

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics

Add the full text or supplementary notes for the publication here using Markdown formatting.