ChatGPT like models have revolutionized the artificial intelligence work by their incredible capabilities for solving real world tasks like summarization, coding, and translation, achieving on-par or even surpassing human experts performance. Despites the impressive capabilities of these models, there is still a lack of an end-to-end Reinforcement Learning with Human Feedback (RLHF) pipeline for training ChatGPT like model.
In a new paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales, a Deepspeed of Microsoft research team presents DeepSpeed-Chat, a novel end-to-end RLHF pipeline that provides easy-to-use training and inference for ChatGPT-like models while delivering unparalleled efficiency and scalability for training models that have hundreds of billions of parameters.
The team summarizes the proposed DeepSpeed-Chat with the following three capabilities :
- Easy-to-use Training and Inference Experience for ChatGPT Like Models.
- DeepSpeed-RLHF Pipeline that replicates the training pipeline from the InstructGPT paper with careful attention to ensure completeness and one-to-one correspondence.
- DeepSpeed-RLHF System that combines the training and inference prowess of DeepSpeed into single unified Hybrid Engine (DeepSpeedHE) for RLHF.
The team stars by showing how easily to train OPT-13B and OPT-66B models with DeepSpeed-RLHF system, as well as how to leverage DeepSpeed-chat RLHF API to customarize user-defined pipelines. Specifically, only one script is needed to completes all three stages: 1) Supervised Finetuning (SFT), 2) Reward Model Fine-tuning and 3) RLHF to build user’s own ChatGPT like model. They also provide flexible APIs that enable a general interface and backend for users to build their own RLHF training pipeline at ease.
Moreover, the researchers combine the full system capability of DeepSpeed Training and Inference into a unified architecture which they call Hybrid Engine. The engine uses a light-weight memory management system to significantly boost throughput and enable memory optimization techniques to deliver high training efficiency. It also supports tensor-parallelism and ZeRO-based sharding mechanism that cut substantial costs to deliver unparalleled scale and system efficiency for RLHF workloads.
Overall, DeepSpeed-Chat system offers easy, efficient, affordable and excellent scalability for RLHF training of ChatGPT-like models, the team has open-sourced DeepSpeed-Chat and they open to collaborations with AI community to work on applying DeepSpeed on real-world applications.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.