Hugging Face Uses Block Pruning to Speedup Transformer Training While Maintaining Accuracy

A research team from Hugging Face introduces a block pruning approach targeting both small and fast models, which learns to eliminate full components of the original model while effectively dropping a large number of attention heads.