AI shows uneven progress in mathematics. While it can solve complex geometry problems from the International Math Olympiad (IMO) almost instantly, it struggles with combinatorics, which requires more playful, puzzle-like creativity. This highlights the 'spiky frontier' of AI capabilities, where proficiency in one domain doesn't guarantee it in another, closely related one.
Moving beyond solving existing problems like the Millennium Prize problems, the true test of advanced AI in mathematics will be its ability to generate novel, interesting conjectures and create new, unifying definitions. This represents a higher tier of mathematical creativity, akin to the work of the greatest mathematicians who frame the questions for others to solve.
Évariste Galois's development of group theory provides a historical precedent for revolutionary concepts that are not immediately recognized or useful. It took over a century for the full value of his ideas to be appreciated in fields like physics and cryptography, showing that the most profound insights may fail immediate 'verification' by peers or practical application.
Future AI-driven mathematical discoveries will likely follow two paths. One is finding 'lightning bolt' connections between existing, disparate fields (e.g., number theory and physics). The other, more profound path, is 'mountain building'—constructing entirely new theoretical frameworks, a skill signifying a much higher level of general intelligence.
As AIs automate theorem proving and even explanation, the role of human mathematicians will shift. Instead of being creators, they will act as curators, using their taste and social connection to guide others through the vast, AI-generated landscape of mathematical ideas. Their value will lie in providing motivation and a human-centric narrative.
A key, underappreciated advantage of AI is its potential for systematic context-switching. Unlike humans who get stuck in a single line of reasoning, AI systems can be programmed to simultaneously pursue contradictory goals (e.g., proving and disproving a theorem) or be given different starting biases, allowing them to escape cognitive ruts and explore a problem space more thoroughly.
Verifiability alone doesn't explain AI's rapid progress in math and coding. The key factor is 'grindability'—the ability to run thousands of parallel, containerized, and deterministic simulations. This allows for efficient credit assignment and learning, a luxury not available in domains like e-commerce or business strategy, which are constrained by real-world interactions and bot detectors.
There's a critical distinction between a proof (which establishes truth) and an explanation (which provides understanding). Even when a complex mathematical problem is solved, there remains an 'unsolved expository problem' of making the solution comprehensible. This need for clarity and intuition will remain a crucial area for human or AI effort, even after theorems are proven.
AIs struggle with mentalizing and empathy because they lack embodiment. Citing a study where Botox users became worse at reading facial expressions, Sanderson suggests our ability to understand others' emotions is partly based on subconsciously mimicking them. AIs, being disembodied, cannot perform this mimicry, leading to a fundamental deficit in their 'theory of mind.'
Unlike natural language proofs that require human verification, formal systems like Lean allow for automated, verifiable rewards. This could enable an AI to endlessly extend a mathematical library like Mathlib, exploring a vast tree of logic and potentially discovering novel theories without any human check-ins, similar to how AlphaGo trained itself by playing millions of games.
There is a strong correlation between creating genuinely novel insights and being able to explain them clearly. Figures like Einstein, Claude Shannon, and Feynman wrote lucid, accessible papers. This suggests the same part of the brain that formulates a new way of thinking is also adept at communication, debunking the 'expert's curse' myth for true pioneers.
The auto-regressive, next-token-prediction nature of current LLMs is a 'really, really weird way to produce stuff.' True human creativity and writing insight involve knowing precisely when to make an unpredictable, non-obvious move. This is directly contrary to the model's core process, which is a slave to its immediate context and favors predictable outputs.
