A Joint Model of Orthography and Morphological Segmentation

Ryan Cotterell, Tim Vieira, Hinrich Schütze

June 2016

Abstract

We present a model of morphological segmentation that jointly learns to segment and restore orthographic changes, e.g., funniest 7 → fun-y-est. We term this form of analysis canonical segmentation and contrast it with the traditional surface segmentation, which segments a surface form into a sequence of substrings, e.g., funniest 7 → funn-i-est. We derive an importance sampling algorithm for approximate inference in the model and report experimental results on English, German and Indonesian.

Type

Conference paper

Publication

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Add the full text or supplementary notes for the publication here using Markdown formatting.