AI Machine Learning & Data Science Research

‘Train Large, Then Compress’ – UC Berkeley BAIR Improves Large Transformer Model Training and Inference

Researchers from the Berkeley Artificial Intelligence Research (BAIR) Lab at UC Berkeley explored the effect of Transformer model size on training and inference efficiency.

In the current state of deep learning, methods that can be used to improve model accuracy basically come down to increasing model size, dataset size, or number of training steps. These methods however require large and very expensive compute resources. Optimizing computing efficiency has become a key goal for researchers when computing resources are limited. How to achieve higher accuracy with limited hardware support and training time?

To address this issue, researchers from the Berkeley Artificial Intelligence Research (BAIR) Lab at UC Berkeley explored the effect of Transformer model size on training and inference efficiency. Their new paper shows that with limited resources, training and inference efficiency can be improved by significantly increasing the size of the Transformer models and heavily compressing them.

image.png
Under the usual presumption that models are trained to convergence, only small models that are fast-to-execute are feasible in resource-constrained settings. The work shows that the most compute-efficient training scheme is instead to train very large models, stop them well short of convergence, and then heavily compress them to meet test-time constraints.

The researchers conducted several experiments and found that in a given time, the deeper RoBERTa model (RoBERTa is an optimized BERT pretraining approach) with more layers had lower perplexity than the model with fewer layers. The wider RoBERTa model also had lower perplexity.

Researchers also evaluated the validation BLEU score of models in different sizes when training an English-French transformer machine translation model. BLEU score is an automatic evaluation metric for machine translation (the higher, the better). In the same training time, deeper and wider models outperformed the smaller models. Researchers also found that increasing model width or depth resulted in faster training for RoBERTa pretraining, and that the wider model works better in machine translation tasks.

Although training a larger model can deliver higher efficiency, this also raises the computation and memory cost of inference, and the total cost of inference is much higher than the training cost in most practical applications. The “Train Large, Then Compress” approach can solve this problem. Researchers used compression techniques such as quantization and pruning, both of which can reduce inference latency and memory requirements.

In the case of RoBERTa, the researchers first pretrained different size RoBERTa models with the same given time, then fine-tuned these models on a downstream text classification task and applied pruning or quantization methods for compression. It was found that in a given test time, increasing model size and then applying heavy compression worked best.

Researchers conducted a preliminary investigation of their findings limited to the field of natural language processing, and say their conclusions could be further explored in the other fields in the future.

The paper Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers is on arXiv.


Author: Herin Zhao | Editor: Michael Sarazen

46 comments on “‘Train Large, Then Compress’ – UC Berkeley BAIR Improves Large Transformer Model Training and Inference

  1. Thanks for you.

  2. Nice topic.

  3. Thanks for you.

  4. Best wishesI enjoyed reading the topic and thank you for sharing it with us, Best Regards

  5. Thank you very much

  6. This topic good

  7. This article good

  8. I like to thanks

  9. Good like

  10. I love the article

  11. I love this topic

  12. Yes ,i will to

  13. Goood and nice

  14. Good

  15. Wi can you and Thanks

  16. I uesed to

  17. Thank you very good……….

  18. Good good good

  19. Thank you very much

  20. Good good good good topic

  21. Thank you this topic good

  22. Very nice ….

  23. Mérci pour article

  24. Very very very very niiiiice

  25. Good article and thank you

  26. Good good article…

  27. Very very niiiiiice

  28. Good
    Good good

  29. Very very nice..

  30. Very very niiiiiiiiiiiiiiiiiiice

  31. Bien bien bien article

  32. Mèrci pour article

  33. Very very very nice

  34. Mèrci mèrci..

  35. Topic is good.

  36. Good good very good

  37. Bien bien article

  38. Good good topic.

  39. Veeeery niiiiiiiiiiiice

  40. Good
    Thank you very much.

  41. Very very niiiiiiice
    And good
    Article

  42. Nice nice and gooooooooood

  43. Bien article mèrci

  44. Good article very nice
    …………

  45. Bien bien article good very nice…………….

  46. Niiiiiice niiiiiice good

Leave a Reply to ziane Cancel reply

Your email address will not be published. Required fields are marked *

%d