Pretrained large language models (LLMs) are now scaled to more than 100B parameters and have revolutionized the field of natural language processing (NLP) with their excellent few-shot and zero-shot learning capabilities. However, although state-of-the-art LLMs make short work of system-1 tasks, they still struggle on system-2 tasks that require slow and multi-task reasoning.
A research team from the University of Tokyo and Google Brain addresses this deficiency in their new paper Large Language Models are Zero-Shot Reasoners, which demonstrates that LLMs can become decent zero-shot reasoners through the addition of a simple prompt — “Let’s think step by step” — that motivates a step-by-step thinking process before each question is answered. Their resulting Zero-shot-CoT (chain of thought prompting) model achieves huge performance gains compared to the zero-shot baseline.
The division of human thinking into fast/automatic (system-1) and slow/rational (system-2) processes was proposed in the 2011 bestseller Thinking, Fast and Slow by psychologist Daniel Kahneman and has been widely adopted by machine learning researchers seeking to endow their models with more advanced and humanlike reasoning capabilities.
The proposed Zero-shot-CoT is a zero-shot template-based prompting approach for chain-of-thought reasoning that, unlike conventional methods, does not require human engineering of prompt examples. Zero-shot-CoT uses an initial prompt for reasoning and a second prompt for answer extraction, enabling it to generate a plausible reasoning path in a zero-shot manner and obtain correct answers where standard zero-shot approaches often fail. It is also versatile and task-agnostic, making it applicable in areas ranging from arithmetic and symbolic tasks to common-sense reasoning.
In their empirical study, the team compared Zero-shot-CoT with prompting baselines that included standard Few-shot (Brown et al., 2020), Few-shot-CoT (Wei et al., 2022) and standard Zero-shot. In the evaluations, the proposed Zero-shot-CoT achieved astounding performance improvements compared to the zero-shot baseline — boosting accuracy from 17.7 percent to 78.7 percent on MultiArith and from 10.4 percent to 40.7 percent on GSM8K.
Overall, this work demonstrates the potential of LLMs as zero-shot reasoners, and the team hopes it will encourage further research aimed at fully realizing and exploiting the high-level and multi-task zero-shot capabilities inside such models.
The paper Large Language Models are Zero-Shot Reasoners is on arXiv.
Author: Hecate He | Editor: Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.