The Evolution of Automatic Metrics, From String Matching to Black-Box LLM

Abstract

For years, the progress in modeling has outpaced the evaluation in NLP, where we relied predominantly on string-based matching metrics. In this talk, we will outline the benefits and differences among the three classes of metrics: n-gram matching (such as ChrF or BLEU), pretrained models (COMET, BLEURT), and black-box LLMs (GEMBA). We will primarily focus on the emerging evaluation based on LLMs, highlighting open questions and challenges anticipated in the upcoming era.

Date
May 24, 2023 10:30 AM — 12:00 PM
Location
OAT S13

Bio

Tom Kocmi is a researcher at Microsoft Translator focusing on human and automatic evaluation of machine translation. He coordinates the annual WMT General MT shared task where researchers from both academia and industry compete to build the best performing MT systems.