AI Machine Learning & Data Science Research

OpenAI’s Statement Curriculum Learning Method Cracks High School Olympiad Level Mathematics Problems

An OpenAI research team presents an expert iteration-based neural theorem prover capable of solving a curriculum of increasingly difficult mathematical problems (such as high-school olympiad-level problems) from a set of formal statements of sufficiently varied difficulty and without the need for associated ground-truth proofs.

Deep Learning approaches have achieved impressive performance on many complex tasks, but still struggle with tasks that require extensive planning and symbolic reasoning. This is especially true in formal mathematics, where an infinite action space and lack of self-play challenges make a naive application of reinforcement learning (RL) unlikely to succeed.

To address this issue, a research team from OpenAI, France’s École Polytechnique and the University of Cambridge has proposed an expert iteration-based neural theorem prover that is capable of solving a curriculum of increasingly difficult mathematics problems (such as challenging high-school olympiad-level problems) without the need for associated ground-truth proofs.

The team summarizes their study’s contributions as:

  1. We present lean-gym, a simple REPL interface for interacting with the Lean theorem prover.
  2. We propose an expert iteration methodology for GPT-f (Polu & Sutskever, 2020) which uses proofs generated by our models as training data to iteratively improve their performance.
  3. We demonstrate that, at a fixed compute budget, expert iteration outperforms proof search only.
  4. We present a synthetic inequality generator and study how expert iteration finds and solves a curriculum of increasingly difficult problems from a set of generated statements of various difficulty.
  5. We present a manually curated set of formalized problem statements and leverage it to achieve state-of-the-art performance on the miniF2F benchmark.

The proposed method relies heavily on the 2020 GPT-f paper (Generative Language Modeling for Automated Theorem Proving, Polu & Sutskever), which studies the use of language models to generate tactics; and the 2021 PACT paper (Proof Artifact Co-training for Theorem Proving with Language Models, Han et al.) which applies GPT-f to the Lean interactive theorem prover (de Moura et al., 2015) and studies the benefits of co-training on self-supervised objectives.

The researchers first explore the use of expert iteration in the context of language modelling applied to formal mathematics. Expert iteration was introduced by Silver et al. in 2017 and involves iteratively training models to achieve continuous improvement. The expert iteration methodology developed in this paper includes both models and pretraining strategies.

The team starts with a decoder-only transformer model similar to GPT-3 and pretrains their models successively on GPT-3’s postprocessed version of the CommonCrawl corpus and an updated version of WebMath (Polu & Sutskever, 2020). The training comprises both proofstep and proofsize objectives: the former for generating a proofstep (a Lean tactic) given a goal (a Lean tactic state); the latter for generating a token that represents a proof size estimate bucket for the current goal, which is used as a guideline for proof searching.

The team then runs their expert iteration approach on synthetic statements generated by an inequality generator. These synthetic statements enable the researchers to control the difficulty of each statement to reveal whether expert iteration can hill-climb the intrinsic difficulty gradient of the resulting set of statements. The results show that with a fixed compute budget, expert iteration can eventually close proofs even for hard statements.

In their evaluations, the team used the miniF2F (Zheng et al., 2021) formal mathematics benchmark, which contains 244 validation and 244 test formalized statements of mathematical problems from the AMC, AIME and IMO olympiads and high-school and undergraduate maths classes. The team’s proposed expert iteration-based approach achieved state-of-the-art results against this benchmark (41.2 percent vs 29.3 percent), validating its ability to automatically solve multiple challenging problems at a high school olympiad level.

The team hopes that their proposed statement curriculum learning methodology will help accelerate progress in automated reasoning, especially if scaled with automated generation and curation of formal statements in the future.

The paper Formal Mathematics Statement Curriculum Learning is on arXiv.


Author: Hecate He | Editor: Michael Sarazen


We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

0 comments on “OpenAI’s Statement Curriculum Learning Method Cracks High School Olympiad Level Mathematics Problems

Leave a Reply

Your email address will not be published.

%d bloggers like this: