Anthropic's AI Model Finds 10,000 Critical Bugs in One Month

Key Takeaways

- Mythos Preview identified 10,000+ critical vulnerabilities across 1,000 open-source projects in one month
- Independent reviewers confirmed 90.6% of flagged bugs were legitimate security flaws
- The discovery rate now exceeds human patching capacity, creating a 'remediation bottleneck'
Project Glasswing's First Results
Anthropic has released the first results from Project Glasswing, a restricted cybersecurity program that gives select organizations access to its new AI model, Mythos Preview. The numbers are striking: more than 10,000 high or critical severity vulnerabilities identified across widely used open-source software in just one month.
Around 50 partners participated in the trial, including technology companies and research organizations. Mythos Preview scanned more than 1,000 open-source software projects during this period.
Independent security firms reviewed 1,752 of the model's findings. They confirmed 90.6% were legitimate vulnerabilities. Of those, 62.4% qualified as genuinely high or critical risks.
Cloudflare and Mozilla Report 10x Improvement
Cloudflare found about 2,000 bugs in its internal software using Mythos Preview. Four hundred of those qualified as high or critical severity. The company reported the AI generated fewer false alarms than traditional human-led testing.
“After one month, most partners have each found hundreds of critical- or high-severity vulnerabilities in their software. Collectively, they've found more than ten thousand. Several have told us that their rate of bug-finding has increased by more than a factor of ten.”
— Anthropic blog post
Mozilla used Mythos Preview to analyze Firefox code and fixed 271 vulnerabilities in Firefox 150. The company said the new system performed significantly better than Anthropic's earlier Claude Opus 4.6 model.
The Remediation Bottleneck
Anthropic raised a concern that accompanies the good news: human teams may struggle to review and fix the large number of vulnerabilities uncovered by advanced AI systems.
“The bottleneck is no longer discovery, but remediation. We have essentially automated the process of finding security holes, but the speed of human patching has not kept pace.”
— Dario Amodei, CEO of Anthropic
The company noted in its blog post that there's often a long lag between discovering a vulnerability, creating a patch, and deploying that patch to end users. AI-powered discovery at this scale could widen that gap.
Partners are currently bound by a 90-day non-disclosure agreement that shields most technical details of the discovered bugs. This window allows maintainers time to implement fixes before vulnerabilities become public knowledge.
Legacy Code Under the Microscope
The AI's speed has revealed how many security issues have been hiding in older codebases. Some of these bugs have existed for decades without detection.
"Seeing a 27-year-old bug in a critical system library discovered in seconds was a sobering reminder that our legacy codebases have been hiding these ticking time bombs for decades." — Sarah Jenkins, Lead Security Analyst at the Cyber Verification Program
This finding underscores a broader pattern. Open-source software powers much of the internet's infrastructure. These projects are often maintained by small teams or even individual developers. A sudden influx of thousands of verified security reports could overwhelm them.
Community Response: Excitement and Concern
Discussion on Hacker News focused heavily on the remediation bottleneck. Many engineers expressed concern that high-quality vulnerability reports will overwhelm already under-resourced open-source maintainers.
Security-focused communities on Reddit are debating the ethics of AI models that could potentially generate exploits alongside discovery. Some are calling for clearer disclosure timelines beyond the current 90-day window.
What Happens Next
Anthropic hasn't announced plans to expand access beyond the current 50 partners. The company appears to be proceeding carefully, likely aware of the dual-use nature of such powerful vulnerability detection.
The immediate question for the software industry: how do you handle 10,000 critical bugs when your security teams were built for dozens? The tools for finding problems have leaped ahead. The systems for fixing them have not.
More on Anthropic's AI capabilities in software development
Context on Anthropic's growing role in government security
Logicity's Take
Frequently Asked Questions
What is Anthropic's Mythos Preview model?
Mythos Preview is Anthropic's new AI model designed specifically for cybersecurity applications. It scans software code to identify security vulnerabilities and has been tested through the company's restricted Project Glasswing initiative.
How accurate is AI bug detection compared to human testing?
Independent security firms confirmed 90.6% of Mythos Preview's flagged vulnerabilities were legitimate. Cloudflare reported the AI produced fewer false alarms than traditional human-led testing methods.
What is the 'remediation bottleneck' in AI security scanning?
The remediation bottleneck refers to the gap between AI-powered discovery of vulnerabilities (which is now extremely fast) and human capacity to review, prioritize, and patch those issues (which remains limited by available developer time).
How many open-source projects did Anthropic's AI scan?
Mythos Preview scanned more than 1,000 open-source software projects during the one-month trial period with 50 partner organizations.
What is the disclosure timeline for discovered bugs?
Partners are bound by a 90-day non-disclosure agreement that shields technical details of discovered vulnerabilities. This gives maintainers time to implement fixes before the bugs become public.
Need Help Implementing This?
Source: Tech-Economic Times / ET
Huma Shazia
Senior AI & Tech Writer
Related Articles
Browse all
Robotaxi Companies Are Hiding How Often Humans Take the Wheel
Autonomous vehicle firms like Waymo and Tesla are under scrutiny for refusing to disclose how often remote operators step in to control their self-driving cars. A Senate investigation reveals major gaps in transparency, raising safety and accountability concerns.

Wisconsin Governor Throws a Wrench in Age Verification Plans
Wisconsin Governor Tony Evers has vetoed a bill that would have required residents to verify their age before accessing adult content online, citing concerns over privacy and data security. This move comes as several other states have already implemented similar age check requirements. The veto has significant implications for the future of online age verification.

Apple's App Store Empire Under Siege: The Battle for the Future of Tech
The long-running feud between Apple and Epic Games has reached a boiling point, with Apple preparing to take its case to the Supreme Court. The tech giant is fighting to maintain control over its App Store, while Epic Games is pushing for more freedom for developers. The outcome could have far-reaching implications for the entire tech industry.

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself
The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.
Also Read

How Hackers Exploit Chatbot Personalities to Bypass Safety
Adversarial users have moved past crude jailbreaks toward psychological manipulation of AI chatbots. By building rapport over multiple conversation turns, they trick models into ignoring safety guardrails. The shift exposes a fundamental tension between making chatbots useful and keeping them secure.

Volcanic Rock Spray Coating Claims 43dB Stealth for Drones
A Turkish researcher claims a sprayable radar-absorbent material made from volcanic basalt and pumice can reduce drone radar signatures by up to 43dB. That's roughly double the attenuation of conventional stealth coatings. Independent verification is still pending.

How Unreal Engine Powers Rocket League's Paris Major Broadcast
Epic Games brought Unreal Engine 5 to the Rocket League Championship Series Paris Major, not for the game itself, but to run real-time arena lighting, cameras, and broadcast production. The event drew 25,000 fans and showcased tech that may preview the game's future engine upgrade.