AI Machine Learning & Data Science Research

Microsoft Releases DeepSpeed-Chat for RLHF Training of ChatGPT-like Models

In a new paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales, a Deepspeed of Microsoft research team presents DeepSpeed-Chat, a novel end-to-end RLHF pipeline that provides easy-to-use training and inference for ChatGPT-like models at scale.

ChatGPT like models have revolutionized the artificial intelligence work by their incredible capabilities for solving real world tasks like summarization, coding, and translation, achieving on-par or even surpassing human experts performance. Despites the impressive capabilities of these models, there is still a lack of an end-to-end Reinforcement Learning with Human Feedback (RLHF) pipeline for training ChatGPT like model.

In a new paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales, a Deepspeed of Microsoft research team presents DeepSpeed-Chat, a novel end-to-end RLHF pipeline that provides easy-to-use training and inference for ChatGPT-like models while delivering unparalleled efficiency and scalability for training models that have hundreds of billions of parameters.

The team summarizes the proposed DeepSpeed-Chat with the following three capabilities :

  1. Easy-to-use Training and Inference Experience for ChatGPT Like Models.
  2. DeepSpeed-RLHF Pipeline that replicates the training pipeline from the InstructGPT paper with careful attention to ensure completeness and one-to-one correspondence.
  3. DeepSpeed-RLHF System that combines the training and inference prowess of DeepSpeed into single unified Hybrid Engine (DeepSpeedHE) for RLHF.

The team stars by showing how easily to train OPT-13B and OPT-66B models with DeepSpeed-RLHF system, as well as how to leverage DeepSpeed-chat RLHF API to customarize user-defined pipelines. Specifically, only one script is needed to completes all three stages: 1) Supervised Finetuning (SFT), 2) Reward Model Fine-tuning and 3) RLHF to build user’s own ChatGPT like model. They also provide flexible APIs that enable a general interface and backend for users to build their own RLHF training pipeline at ease.

Moreover, the researchers combine the full system capability of DeepSpeed Training and Inference into a unified architecture which they call Hybrid Engine. The engine uses a light-weight memory management system to significantly boost throughput and enable memory optimization techniques to deliver high training efficiency. It also supports tensor-parallelism and ZeRO-based sharding mechanism that cut substantial costs to deliver unparalleled scale and system efficiency for RLHF workloads.

Overall, DeepSpeed-Chat system offers easy, efficient, affordable and excellent scalability for RLHF training of ChatGPT-like models, the team has open-sourced DeepSpeed-Chat and they open to collaborations with AI community to work on applying DeepSpeed on real-world applications.

The code is available in project’s GitHub. The paper DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales on arXiv.


Author: Hecate He | Editor: Chain Zhang


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “Microsoft Releases DeepSpeed-Chat for RLHF Training of ChatGPT-like Models

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: