Home
People
Publications
Teaching
Thesis Projects
Reading Group
Talks
Joining
Jiaoda Li
Latest
Characterizing the Expressivity of Transformer Language Models
Probability Distributions Computed by Hard-Attention Transformers
Unique Hard Attention: A Tale of Two Sides
A Transformer with Stack Attention
What Do Language Models Learn in Context? The Structured Task Hypothesis.
Probing via Prompting
Differentiable Subset Pruning of Transformer Heads
Cite
×