Home
People
Publications
Teaching
Thesis Projects
Reading Group
Talks
Joining
Jiaoda Li
Latest
Unique Hard Attention: A Tale of Two Sides
A Transformer with Stack Attention
What Do Language Models Learn in Context? The Structured Task Hypothesis.
Probing via Prompting
Differentiable Subset Pruning of Transformer Heads
Cite
×