We present a model of morphological segmentation that jointly learns to segment and restore orthographic changes, e.g., funniest 7 → fun-y-est. We term this form of analysis canonical segmentation and contrast it with the traditional surface segmentation, which segments a surface form into a sequence of substrings, e.g., funniest 7 → funn-i-est. We derive an importance sampling algorithm for approximate inference in the model and report experimental results on English, German and Indonesian.
Add the full text or supplementary notes for the publication here using Markdown formatting.