ChatGPT 5.5 Pro Produces PhD-Level Math Research in One Hour

Key Takeaways

- ChatGPT 5.5 Pro completed PhD-level mathematical research in roughly one hour with minimal human guidance
- LLMs can now solve open research problems that human mathematicians missed, not just find existing answers
- The bar for what counts as a 'good first research problem' for new mathematicians has been raised
Timothy Gowers, Fields Medal winner and one of the world's most prominent mathematicians, just reported something that should make anyone paying attention to AI sit up straight. ChatGPT 5.5 Pro produced what he describes as PhD-level mathematical research in about an hour. His contribution? Essentially none.
"We are all having to keep revising upwards our assessments of the mathematical capabilities of large language models," Gowers wrote on his blog. "I have just made a fairly large revision."
From Finding Answers to Finding New Arguments
The initial reaction to LLMs solving math problems was easy to dismiss. Early "solutions" often meant the model found an existing answer in the literature or made an obvious deduction from known results. Mathematicians could comfort themselves: these systems were search engines with better prose, not thinkers.
That comfort is evaporating. LLMs have now solved several of the open Erdős problems, a collection of challenges posed by the legendary mathematician Paul Erdős that have stumped researchers for decades. The problems are tracked on Thomas Bloom's website, and the AI solutions keep coming.
Gowers describes the current state: "LLMs have got to the point where if a problem has an easy argument that for one reason or another human mathematicians have missed, then there is a good chance that the LLMs will spot it." The reason humans missed these arguments varies. Sometimes the problem just hasn't received much attention. Sometimes the solution requires combining techniques from different areas in non-obvious ways.
Testing Against Genuinely Open Problems
Gowers decided to run an experiment. In combinatorics, research papers often introduce new parameters and pose several natural questions about them. Authors can't spend weeks on every question, so some remain open despite having approachable solutions. These problems have traditionally been perfect for PhD students and early-career researchers. Solving an officially open problem builds confidence and credentials.
He fed ChatGPT 5.5 Pro a selection of problems from a paper by Mel Nathanson, titled "Diversity, Equity and Inclusion for Problems in Additive Number Theory." The results were apparently striking enough that Gowers felt compelled to write about them immediately.
What This Means for Mathematical Training
The implications hit hardest for how mathematicians are trained. Finding a first research result is a crucial step. It proves to the student, their advisor, and future employers that they can do original work. If LLMs can now clear the bar that used to define "publishable first result," that bar needs to move.
“It is no longer enough that somebody asks a problem: it needs to be hard enough for an LLM not to be able to solve it.”
— Timothy Gowers
Gowers acknowledges a counter-argument that offers limited comfort: "Quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques." If that's what LLMs are doing, they're doing exactly what mathematicians do. The distinction between "synthesis" and "originality" gets blurry.
The Larger Pattern
This fits a pattern we've seen across fields. AI doesn't need to match top experts to be disruptive. It needs to handle tasks that took humans significant time, or that served as proving grounds for newcomers. Radiology residents, junior lawyers doing document review, entry-level programmers, now first-year math PhD students. The jobs that trained the next generation are becoming optional.
What remains human? Gowers doesn't speculate, but his experiment suggests the remaining territory is smaller than many mathematicians assumed. Problems that require genuinely novel approaches, questions that haven't been asked before, and the taste to know which problems matter. Whether those are enough to sustain the traditional pipeline of mathematical talent is an open question.
Logicity's Take
Frequently Asked Questions
What is ChatGPT 5.5 Pro?
ChatGPT 5.5 Pro is a newer version of OpenAI's language model that Gowers received early access to test. It appears to have significantly improved mathematical reasoning compared to previous versions.
What are Erdős problems?
These are open mathematical problems posed by Paul Erdős, one of history's most prolific mathematicians. They cover various areas of mathematics and have challenged researchers for decades. LLMs have recently solved several of them.
Can LLMs do original mathematical research?
According to Gowers, LLMs can now find arguments that human mathematicians missed. Whether this counts as "original" is debatable, since humans also build on existing knowledge. The practical distinction is becoming less meaningful.
How will this affect math education?
Problems that were once appropriate for first-time researchers may now be solvable by LLMs. This raises the difficulty bar for what counts as meaningful early research, potentially changing how graduate students are trained.
Another example of AI disrupting established professional workflows
Need Help Implementing This?
Source: Hacker News: Best
Manaal Khan
Tech & Innovation Writer
Related Articles
Browse all
Robotaxi Companies Are Hiding How Often Humans Take the Wheel
Autonomous vehicle firms like Waymo and Tesla are under scrutiny for refusing to disclose how often remote operators step in to control their self-driving cars. A Senate investigation reveals major gaps in transparency, raising safety and accountability concerns.

Wisconsin Governor Throws a Wrench in Age Verification Plans
Wisconsin Governor Tony Evers has vetoed a bill that would have required residents to verify their age before accessing adult content online, citing concerns over privacy and data security. This move comes as several other states have already implemented similar age check requirements. The veto has significant implications for the future of online age verification.

Apple's App Store Empire Under Siege: The Battle for the Future of Tech
The long-running feud between Apple and Epic Games has reached a boiling point, with Apple preparing to take its case to the Supreme Court. The tech giant is fighting to maintain control over its App Store, while Epic Games is pushing for more freedom for developers. The outcome could have far-reaching implications for the entire tech industry.

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself
The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.
Also Read

Yarbo Robot Lawn Mower Has Critical Security Flaws
Security researchers discovered that the $5,000 Yarbo robot lawn mower can be remotely hijacked by hackers. The vulnerabilities expose owners' Wi-Fi passwords, email addresses, and home locations. The 200-pound machine with spinning blades becomes a potential physical threat when compromised.

Claude Cowork Found 50 GB of Junk Files in 5 Minutes
Anthropic's Claude desktop app has a Cowork mode that can access your file system directly. One tech journalist used it to scan his Windows PC and identify large, unnecessary files that traditional cleanup tools missed.
Why Google's Preferred Sources Feature Won't Fix Search
Google now lets users manually mark news outlets they want to see more often in search results. The company claims this supports quality journalism. The reality is more complicated: Google already knows which sources are reliable, making a manual tool unnecessary if better results were the actual goal.