ChatGPT like models have revolutionized the artificial intelligence work by their incredible capabilities for solving real world tasks like summarization, coding, and translation, achieving on-par or even surpassing human experts performance. Despites the impressive capabilities of these models, there is still a lack of an end-to-end Reinforcement Learning with Human Feedback (RLHF) pipeline for training ChatGPT like model.
In a new paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales, a Deepspeed of Microsoft research team presents DeepSpeed-Chat, a novel end-to-end RLHF pipeline that provides easy-to-use training and inference for ChatGPT-like models while delivering unparalleled efficiency and scalability for training models that have hundreds of billions of parameters.

The team summarizes the proposed DeepSpeed-Chat with the following three capabilities :
- Easy-to-use Training and Inference Experience for ChatGPT Like Models.
- DeepSpeed-RLHF Pipeline that replicates the training pipeline from the InstructGPT paper with careful attention to ensure completeness and one-to-one correspondence.
- DeepSpeed-RLHF System that combines the training and inference prowess of DeepSpeed into single unified Hybrid Engine (DeepSpeedHE) for RLHF.

The team stars by showing how easily to train OPT-13B and OPT-66B models with DeepSpeed-RLHF system, as well as how to leverage DeepSpeed-chat RLHF API to customarize user-defined pipelines. Specifically, only one script is needed to completes all three stages: 1) Supervised Finetuning (SFT), 2) Reward Model Fine-tuning and 3) RLHF to build user’s own ChatGPT like model. They also provide flexible APIs that enable a general interface and backend for users to build their own RLHF training pipeline at ease.


Moreover, the researchers combine the full system capability of DeepSpeed Training and Inference into a unified architecture which they call Hybrid Engine. The engine uses a light-weight memory management system to significantly boost throughput and enable memory optimization techniques to deliver high training efficiency. It also supports tensor-parallelism and ZeRO-based sharding mechanism that cut substantial costs to deliver unparalleled scale and system efficiency for RLHF workloads.
Overall, DeepSpeed-Chat system offers easy, efficient, affordable and excellent scalability for RLHF training of ChatGPT-like models, the team has open-sourced DeepSpeed-Chat and they open to collaborations with AI community to work on applying DeepSpeed on real-world applications.
The code is available in project’s GitHub. The paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales on arXiv.
Author: Hecate He | Editor: Chain Zhang

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Exciting advancements in RLHF training with DeepSpeed-Chat! Looking forward to witnessing how this innovation further enhances ChatGPT-like models’ capabilities in real-world applications.
Best Plumbing Services in New River AZ
very good article thank you for sharing, i like it very much slope ball game
It was really captivating from the first lines to the last pages; slopewas impressed with the way you developed the plot and built the character