Search

Home
People
Publications
Teaching
Thesis Projects
Reading Group
Talks
Joining

Jiaoda Li

Latest

Characterizing the Expressivity of Transformer Language Models
Probability Distributions Computed by Hard-Attention Transformers
Unique Hard Attention: A Tale of Two Sides
A Transformer with Stack Attention
What Do Language Models Learn in Context? The Structured Task Hypothesis.
Probing via Prompting
Differentiable Subset Pruning of Transformer Heads

© 2026 Eidgenössische Technische Hochschule Zürich | Published with Wowchemy Website Builder

Cite