A Structured Variational Autoencoder for Contextual Morphological Inflection

Abstract

Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the latent variables, we derive an efficient variational inference procedure based on the wake-sleep algorithm. We experiment on 23 languages, using the Universal Dependencies corpora in a simulated low-resource setting, and find improvements of over 10% absolute accuracy in some cases.

Publication
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics

Add the full text or supplementary notes for the publication here using Markdown formatting.