Genius or Subpar AI Mathematician? New Study Questions ChatGPT’s Mathematical Capabilities

In the new paper Mathematical Capabilities of ChatGPT, an international research team tests ChatGPT’s mathematical capabilities and evaluates its suitability as an assistant to professional mathematicians. The team concludes that despite the glowing reviews in mainstream media, ChatGPT’s mathematical abilities "are significantly below those of an average mathematics graduate student."

by Synced

2023-02-03

Comments 14

The November release of ChatGPT garnered unprecedented public and media attention. OpenAI’s conversational large language model (LLM) was widely applauded for its ability to answer complex queries, generate correct computer code and coherent long-form essays, and even solve math problems. But might that last claim have been premature?

In the new paper Mathematical Capabilities of ChatGPT, a research team from the University of Oxford, TU Wein, University of Cambridge, University of Vienna, and Princeton University tests ChatGPT’s mathematical capabilities on publicly available and hand-crafted datasets and evaluates its suitability as an assistant to professional mathematicians. The team concludes that despite the glowing media reviews, ChatGPT’s mathematical abilities “are significantly below those of an average mathematics graduate student.”

The team summarizes their main contributions as follows:

Insight for mathematical use is provided. We show for which types of questions and which domains of mathematics ChatGPT may be useful and how it could be integrated into the workflow of a mathematician.
The failure modes of ChatGPT are identified, as well as the limits of its capabilities. This can aid future efforts to develop LLMs that perform better in mathematics.
We provide benchmarks for testing the mathematical capabilities of future LLMs so that they can be compared to ChatGPT across a range of aspects regarding advanced mathematical comprehension.

To effectively evaluate ChatGPT on advanced math problems, the researchers build a new dataset, GHOSTS, comprising a total of 728 prompts in six carefully crafted subdatasets: Grad-Text, Holes-in-Proofs, Olympiad-Problem-Solving, Symbolic-Integration, MATH, and Search-Engine-Aspects. The researchers say the GHOST datasets surpass publicly available benchmark mathematical datasets in terms of sophistication and reasoning difficulty.

The researchers use LaTeX to encode the mathematical inputs for most of their subdatasets, which are categorized into four dimensions with ascending difficulty: 1) elementary arithmetic problems, 2) symbolic problems, 3) (under)graduate-level exercises from well-known textbooks and questions from math.stackexchange.com, and 4) exercises in the style of Mathematical Olympiad problems.

The team applied ChatGPT on the GHOST datasets and considered output length, stability of the answer under prompt engineering, and how close they judged ChatGPT to be to the correct answer.

ChatGPT failed on most of the problems, faring especially poorly on questions requiring deep insights and original solutions such as those found in the Mathematical Olympiads. The paper concludes that while ChatGPT can effectively search for mathematical objects when given information about them, it struggles with advanced mathematics and delivering consistent, high-quality proofs or calculations.

The team hopes their work will inspire other professional mathematicians to contribute to building a more thorough benchmark for assessing and improving LLMs’ mathematical abilities.

Just one day before this paper was published, OpenAI announced it had upgraded ChatGPT with improved mathematical capabilities. It’s unclear how this latest version would perform in the experiments presented here.

The GHOSTS dataset will be released on the project’s GitHub. The paper Mathematical Capabilities of ChatGPT is on arXiv.

Author: Hecate He | Editor: Michael Sarazen

We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

14 comments on “Genius or Subpar AI Mathematician? New Study Questions ChatGPT’s Mathematical Capabilities”

Basket Random

2023-02-07

As a math fan, this topic caught my attention.

Loading...

Reply
Pingback: Microsoft’s MathPrompter Dramatically Improves LLM Performance on Mathematical Reasoning Tasks | Synced
Darius Quid

2023-03-29

It’s pretty interesting, and that’s another proof that math can be actually complicated. Even Artificial Intelligence struggles with it, so I genuinely don’t understand people who enjoy solving math problems, especially mathematical olympiad ones, it’s way too much for me. I used to hate this class at school, and I even decided to get online math classes from brighterly math for my daughter because I know for a fact that I won’t be able to explain anything to her besides the simplest basics. To be honest, I’m pretty sure that she’s even better than me at math right now.

Loading...

Reply
AngalSeason

2023-05-06

As a student, I can see how the idea of having an AI system do my homework for me can be tempting, but it’s important to remember that there’s no substitute for learning and practicing the material yourself. However, if you do find yourself struggling with math homework or assignments, there are many resources available, including online tutoring services and homework help websites such as https://domyassignments.com These can be great options for getting the assistance you need to succeed in your math courses.

Loading...

Reply
kitty

2023-05-30

Hi! I recently got a computer science homework assignment and I don’t know how to do it at all. Maybe you know how to help me?

Loading...

Reply
Pingback: Microsoft’s MathPrompter Dramatically Improves LLM Performance on Mathematical Reasoning Tasks - MyGPT.Rocks
gold miner

2023-06-22

I see that everything invented has a tendency to serve man but master it yourself.

Loading...

Reply
polo polos

2023-07-20

This is quite an interesting topic. I am sure that artificial intelligence cannot always help students correctly. Therefore, it is better to trust a person in writing homework. Now there are many services for this, I recommend that you find a service that you will trust.

Loading...

Reply
Charlotte White

2023-07-21

I’m currently in need of some academic support, and I’m considering using https://assignmentgeek.com/buy.html. Has anyone used their services for assignment help? I’m particularly interested in their approach to complex subjects. How do people perceive their work? If you have any insights or recommendations for me, please share them. I want to make an informed decision before availing of their services.

Loading...

Reply
Tom Byrdeee

2023-07-21

I’m currently in need of some academic support, and I’m considering using https://assignmentgeek.com/buy.html . Has anyone used their services for assignment help? I’m particularly interested in their approach to complex subjects. How do people perceive their work? If you have any insights or recommendations for me, please share them. I want to make an informed decision before availing of their services.

Loading...

Reply
Alba Flores

2023-08-21

Our expertise is in placing top talent in the commercial and federal sectors for Construction Management, Civil & Construction Engineering, and Architecture & Design. Top Talent Acquisition Companies, such as McLean Intelligent Workforce (MIW), can assist you in bringing on better candidates for specific job positions. To handle all of your construction management needs, we have a large pool of carefully selected individuals.

Loading...

Reply
online betting id

2023-09-28

Hello to Everyone. I really appreciate the useful posts and supportive remarks that have been made on this website.

Loading...

Reply
essaydaixie

2024-10-16

LunwenHelp provides you with reliable and professional essay writing services https://www.lunwenhelp.com/ to help you fulfill your academic needs and let you regain control of your life. All writer teachers are selected from the best teachers who have received more than 300 orders on their writing platform and have a reliable level and stable performance; to ensure that the quality of writing is invincible. For all non-urgent orders, the writing progress will be reported to the customer in a timely manner in the middle, so that the customer can be more relaxed.

Loading...

Reply
Hessa Joseph

2026-03-05

A recent study raises interesting questions about AI’s ability to solve complex mathematical problems. While tools like ChatGPT assist students, many still rely on expert guidance such as https://lawassignmenthelper.co.uk/ services for accurate academic support and critical analysis.

Loading...

Reply