Publications

On the Optimality of Word Lengths

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they …

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and …

SIGMORPHON--UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically …

State-of-the-art generalisation research in NLP: a taxonomy and review

SIGMORPHON 2021 Shared Task on Morphological Reinflection: Generalization Across Languages

This year’s iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and …

SIGTYP 2021 Shared Task: Robust Spoken Language Identification

While language identification is a fundamental speech and language processing task, for many languages and language families it remains …

SIGTYP 2020 Shared Task: Prediction of Typological Features

Typological knowledge bases (KBs) such as WALS (Dryer and Haspelmath, 2013) contain information about linguistic properties of the …

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most …

UniMorph 3.0: Universal Morphology

The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological …

The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection

The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual analysis in morphology examined transfer learning of …

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

The CoNLL-SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically …

UniMorph 2.0: Universal Morphology

The Universal Morphology (UniMorph) project is a collaborative effort to improve how NLP handles complex morphology across the world’s …

CoNLL--SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages

The CoNLL-SIGMORPHON 2017 shared task on supervised morphological generation required systems to be trained and tested in each of 52 …

Contrastive Morphological Typology and Logical Hierarchies

Analysis of Morphology in Topic Modeling

Translation of the CALLHOME Egyptian Arabic Corpus For Conversational Speech Translation

An Algerian Arabic-French Code-Switched Corpus