Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore the statistical estimation of these frontiers and study their rates. Empirically, we find that the proposed scores paired with a range of f-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude by discussing the applications to other AI domains and some future directions.
Krishna Pillutla is a visiting researcher (postdoc) at Google Research in the Federated Learning team. He is broadly interested in machine learning, artificial intelligence, optimization, robustness, and privacy, and specifically in the setting of federated learning, and text generation. He obtained his PhD from the Paul G. Allen School of Computer Science & Engineering at the University of Washington.