AI in Mathematics

A progress ladder showing how far AI has climbed in mathematical reasoning — from solving textbook problems to competing at the International Mathematical Olympiad, publishing in top journals, and the distant summit of open millennium-prize problems.

Achieved Partially achieved Current frontier Not yet reached

Automation Progress

Technology Readiness
Lab Pilot Commercial Mature
Task Automation Rate
~8% of human tasks in this field
People Affected
~250K mathematicians & researchers worldwide
Growth Momentum
~40% CAGR AI for science market 2024–2030
Achieved
Level 1 — Undergraduate & Competition Math

AI systems now routinely score 90%+ on challenging undergraduate-level math benchmarks (MATH, GSM8K) and handle problems from competitions like AMC/AIME with ease. This level is considered solved.

May 2024

GPT-4o scores 76.6% on MATH benchmark

OpenAI's GPT-4o achieved 76.6% on the MATH dataset of competition-level problems, up from <5% just two years earlier with GPT-4's initial attempts.

Sep 2024

OpenAI o1 reaches 94.8% on MATH

The o1 reasoning model achieved 94.8% on MATH and 83.3% on AIME 2024, demonstrating that chain-of-thought reasoning dramatically lifts mathematical performance.

Achieved
Level 2 — International Mathematical Olympiad

AI has progressed from silver to gold at the IMO — widely considered the hardest pre-university math competition in the world, requiring deep creative reasoning and rigorous proof.

Jan 2024

AlphaGeometry solves IMO geometry at silver level

DeepMind's AlphaGeometry solved 25 of 30 historical IMO geometry problems, approaching gold-medal performance in geometry specifically.

Jul 2024

AlphaProof + AlphaGeometry2 reach IMO silver (28/42)

DeepMind's combined system solved 4 of 6 problems at the 2024 IMO, scoring 28/42 — the first AI to officially reach silver-medal standard. Used formal Lean proofs with RL.

Jul 2025

Both OpenAI & DeepMind achieve IMO gold (35/42)

OpenAI's experimental reasoning model and DeepMind's Gemini Deep Think both scored 35/42 (5 of 6 problems) — gold-medal standard. Both used natural-language proofs completed within the 4.5-hour time limit, a major leap from 2024's multi-day formal approach.

Achieved
Level 3 — Discovering New Mathematics & Top-Journal Publications

Beyond solving known problems, AI has begun producing genuinely new mathematical discoveries — finding novel conjectures, beating decades-old bounds, and publishing results in Nature.

Dec 2021

DeepMind AI discovers new knot-theory conjecture

Working with mathematicians from Oxford and Sydney, DeepMind's ML system discovered a previously unknown relationship between algebraic and geometric invariants of knots, leading to a new theorem. Published as a Nature cover story.

Oct 2022

AlphaTensor discovers faster matrix multiplication algorithms

DeepMind's AlphaTensor used reinforcement learning to discover novel matrix multiplication algorithms, improving on Strassen's 50-year-old method for 4×4 matrices and 70+ other sizes. Nature cover story.

Dec 2023

FunSearch breaks 20-year record on cap set problem

DeepMind's FunSearch used LLMs to discover new solutions to the cap set problem (extremal combinatorics), producing the largest cap sets found in 20 years and surpassing human mathematicians' best constructions. Published in Nature.

Nov 2025

AlphaProof paper published in Nature

The full AlphaProof system, demonstrating olympiad-level formal mathematical reasoning via reinforcement learning in Lean, was published in Nature — validating AI's ability to generate machine-verifiable proofs at the highest competition level.

◆ Current frontier — where AI is now pushing
Current Frontier
Level 4 — Research-Frontier Mathematics

Can AI tackle problems that take professional mathematicians hours or days? The FrontierMath benchmark tests exactly this — hundreds of original, expert-crafted research-level problems. The best models are still far from human expert performance, but progress is rapid.

Nov 2024

FrontierMath benchmark: SOTA models solve <2%

When launched, the best AI models (including o1) could solve less than 2% of FrontierMath's expert-level problems spanning number theory, algebraic geometry, and category theory. Fields Medalist Terence Tao called the problems "extremely challenging" even for expert mathematicians.

2025–26

Rapid progress: GPT-5 series reaches ~26%, GPT-5.4 hits 47.6%

Within a year, model performance has surged. GPT-5 scored ~26.3%, and the latest GPT-5.4 solves 47.6% of FrontierMath problems — a remarkable leap, though still below expert human level on the hardest problems.

Ongoing

AI as research co-pilot (Tao's vision)

Fields Medalist Terence Tao envisions AI as a "co-pilot" for mathematical research — not replacing mathematicians but accelerating discovery by handling tedious computations, suggesting proof strategies, and formalizing arguments in proof assistants like Lean.

?
Not yet reached
Level 5 — Open Conjectures & Millennium Prize Problems

The summit: solving problems that have resisted all human efforts for decades or centuries. The seven Millennium Prize Problems (six remain unsolved) represent the pinnacle of mathematical difficulty. No AI has made meaningful direct progress on any of them. These problems likely require entirely new mathematical frameworks, not just better computation.

Open

Riemann Hypothesis (1859)

Concerns the distribution of prime numbers. Unproven for over 160 years. Widely considered one of the deepest unsolved problems in all of mathematics. No AI system has made direct progress.

Open

P vs NP (1971)

The most important open question in theoretical computer science. Asks whether every problem whose solution can be quickly verified can also be quickly solved. Resolving it would reshape cryptography, optimization, and our understanding of computation.

Open

Birch and Swinnerton-Dyer Conjecture, Hodge Conjecture, and others

The remaining Millennium Problems — along with countless other major open conjectures — remain firmly beyond AI's current reach. These likely require not just pattern recognition but entirely novel mathematical insight.