AI in Mathematics — Progress Ladder

Back to Home

Automation Progress

Technology Readiness

Lab Pilot Commercial Mature

Task Automation Rate

~8% of human tasks in this field

People Affected

~250K mathematicians & researchers worldwide

Growth Momentum

~40% CAGR AI for science market 2024–2030

✓

Achieved

Level 1 — Undergraduate & Competition Math

AI systems now routinely score 90%+ on challenging undergraduate-level math benchmarks (MATH, GSM8K) and handle problems from competitions like AMC/AIME with ease. This level is considered solved.

May 2024

GPT-4o scores 76.6% on MATH benchmark

OpenAI's GPT-4o achieved 76.6% on the MATH dataset of competition-level problems, up from <5% just two years earlier with GPT-4's initial attempts.

OpenAI blog

Sep 2024

OpenAI o1 reaches 94.8% on MATH

The o1 reasoning model achieved 94.8% on MATH and 83.3% on AIME 2024, demonstrating that chain-of-thought reasoning dramatically lifts mathematical performance.

OpenAI o1 report

✓

Achieved

Level 2 — International Mathematical Olympiad

AI has progressed from silver to gold at the IMO — widely considered the hardest pre-university math competition in the world, requiring deep creative reasoning and rigorous proof.

Jan 2024

AlphaGeometry solves IMO geometry at silver level

DeepMind's AlphaGeometry solved 25 of 30 historical IMO geometry problems, approaching gold-medal performance in geometry specifically.

Nature paper DeepMind blog

Jul 2024

AlphaProof + AlphaGeometry2 reach IMO silver (28/42)

DeepMind's combined system solved 4 of 6 problems at the 2024 IMO, scoring 28/42 — the first AI to officially reach silver-medal standard. Used formal Lean proofs with RL.

DeepMind blog Nature 2025 paper

Jul 2025

Both OpenAI & DeepMind achieve IMO gold (35/42)

OpenAI's experimental reasoning model and DeepMind's Gemini Deep Think both scored 35/42 (5 of 6 problems) — gold-medal standard. Both used natural-language proofs completed within the 4.5-hour time limit, a major leap from 2024's multi-day formal approach.

DeepMind blog OpenAI blog New Scientist

✓

Achieved

Level 3 — Discovering New Mathematics & Top-Journal Publications

Beyond solving known problems, AI has begun producing genuinely new mathematical discoveries — finding novel conjectures, beating decades-old bounds, and publishing results in Nature.

Dec 2021

DeepMind AI discovers new knot-theory conjecture

Working with mathematicians from Oxford and Sydney, DeepMind's ML system discovered a previously unknown relationship between algebraic and geometric invariants of knots, leading to a new theorem. Published as a Nature cover story.

Nature paper Nature News

Oct 2022

AlphaTensor discovers faster matrix multiplication algorithms

DeepMind's AlphaTensor used reinforcement learning to discover novel matrix multiplication algorithms, improving on Strassen's 50-year-old method for 4×4 matrices and 70+ other sizes. Nature cover story.

Nature paper DeepMind blog

Dec 2023

FunSearch breaks 20-year record on cap set problem

DeepMind's FunSearch used LLMs to discover new solutions to the cap set problem (extremal combinatorics), producing the largest cap sets found in 20 years and surpassing human mathematicians' best constructions. Published in Nature.

Nature paper DeepMind blog

Nov 2025

AlphaProof paper published in Nature

The full AlphaProof system, demonstrating olympiad-level formal mathematical reasoning via reinforcement learning in Lean, was published in Nature — validating AI's ability to generate machine-verifiable proofs at the highest competition level.

Nature paper

⚡

◆ Current frontier — where AI is now pushing

Current Frontier

Level 4 — Research-Frontier Mathematics

Can AI tackle problems that take professional mathematicians hours or days? The FrontierMath benchmark tests exactly this — hundreds of original, expert-crafted research-level problems. The best models are still far from human expert performance, but progress is rapid.

Nov 2024

FrontierMath benchmark: SOTA models solve <2%

When launched, the best AI models (including o1) could solve less than 2% of FrontierMath's expert-level problems spanning number theory, algebraic geometry, and category theory. Fields Medalist Terence Tao called the problems "extremely challenging" even for expert mathematicians.

arXiv paper Epoch AI

2025–26

Rapid progress: GPT-5 series reaches ~26%, GPT-5.4 hits 47.6%

Within a year, model performance has surged. GPT-5 scored ~26.3%, and the latest GPT-5.4 solves 47.6% of FrontierMath problems — a remarkable leap, though still below expert human level on the hardest problems.

LLM Stats leaderboard

Ongoing

AI as research co-pilot (Tao's vision)

Fields Medalist Terence Tao envisions AI as a "co-pilot" for mathematical research — not replacing mathematicians but accelerating discovery by handling tedious computations, suggesting proof strategies, and formalizing arguments in proof assistants like Lean.

Scientific American Math Scholar

Not yet reached

Level 5 — Open Conjectures & Millennium Prize Problems

The summit: solving problems that have resisted all human efforts for decades or centuries. The seven Millennium Prize Problems (six remain unsolved) represent the pinnacle of mathematical difficulty. No AI has made meaningful direct progress on any of them. These problems likely require entirely new mathematical frameworks, not just better computation.

Open

Riemann Hypothesis (1859)

Concerns the distribution of prime numbers. Unproven for over 160 years. Widely considered one of the deepest unsolved problems in all of mathematics. No AI system has made direct progress.

Clay Mathematics Institute

Open

P vs NP (1971)

The most important open question in theoretical computer science. Asks whether every problem whose solution can be quickly verified can also be quickly solved. Resolving it would reshape cryptography, optimization, and our understanding of computation.

Clay Mathematics Institute

Open

Birch and Swinnerton-Dyer Conjecture, Hodge Conjecture, and others

The remaining Millennium Problems — along with countless other major open conjectures — remain firmly beyond AI's current reach. These likely require not just pattern recognition but entirely novel mathematical insight.

All Millennium Problems

About this page: This progress ladder tracks publicly documented, verifiable AI achievements in mathematics. All milestone entries include source links to peer-reviewed papers, official announcements, or reputable science journalism. The "level" framework is editorial — it groups achievements by rough difficulty tiers to give an intuitive sense of where AI stands today.

Last updated: March 2026. The AI-math landscape moves fast — check source links for the latest.