Differentiable Subset Pruning of Transformer Heads

Publication
Transactions of the Association for Computational Linguistics