Large Foundation Models (LFMs) such as ChatGPT and GPT-4 have demonstrated impressive zero-shot learning capabilities on a wide range of tasks. Their successes can be credited to model and dataset size scaling, and the fine-tuning process to align them with user-content.
As these model continue to thrive, an intriguing questions arises: can these models supervise their own behaviors or other models without too much human intervention?
To answer this question, there has been an influx research using LFMs as a teachers to generate datasets to train smaller models. The generated student models however normally have poor reasoning and comprehension skills compared to their teachers.
To address this issue, in a new paper Orca: Progressive Learning from Complex Explanation Traces of GPT-4, a Microsoft research team introduces Orca, a 13-billion parameter model that learns explanation traces; step-by-step thought processes; and complex instructions from GPT-4, which significantly boosts performance of existing state-of-the-art instruction-tuned models.
The team makes three key contributions, including explanation tuning, scaling tasks and instructions and evaluation to address the current challenges of instruction-tune models in terms of task diversity, query complexity, and data scaling.
In explanation tuning, the researchers the query and response pairs from GPT-4 can provide valuable signals for student models learning. Therefore, they augment the pairs with detailed responses to better explain the reasoning process of the teachers when they generate response.
In scaling tasks and instructions, they utilize the Flan 2022 Collection to sample from its task collection to obtain a diverse mixture of tasks and further sub-sample to produce complex prompts, which can be used to query LFMs to produce a rich and diverse training set.
Finally, they provide thoroughly evaluation to assess the generative, reasoning, and comprehension abilities of Orca and compare it against strong baselines, including Text-Davinci-003, ChatGPT, GPT-4 and Vicuna. Orca outperforms SOTA instruction-tuned models such as Vicuna-13B by more than 100% on BigBench Hard (BBH), demonstrates competitive performance on academic exams on zero-shot setting.
Overall, this work verifies that learning from step-by-step explanations has great potential to improve model performance.
The paper Orca: Progressive Learning from Complex Explanation Traces of GPT-4 on arXiv.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.