Claude Mythos Finds 10,000 Bugs in a Month. Patches Can't Keep Up

Manaal KhanMay 23, 2026 at 1:33 PM5 min read

Key Takeaways

Claude Mythos Preview found over 10,000 high or critical severity vulnerabilities in system-critical software within one month
Partners like Cloudflare and Mozilla report tenfold increases in bug discovery rates compared to previous AI models
Anthropic warns that AI now detects vulnerabilities faster than organizations can patch them, creating a growing security gap

Project Glasswing delivers first results

One month after launching Project Glasswing, Anthropic has published its first findings. The results are striking. Claude Mythos Preview, working with roughly 50 partners, has identified more than 10,000 high or critical severity vulnerabilities in system-critical software.

The problem: the model now spots security flaws faster than teams can verify, disclose, and patch them. Anthropic is withholding specific technical details because the standard industry deadline for vulnerability disclosure is 90 days. Most findings cannot be described yet without putting end users at risk.

10,000+

High or critical severity vulnerabilities found by Claude Mythos Preview in one month across system-critical software

Partners report tenfold jumps in bug detection

Glasswing partners run and build software that powers the internet and other critical infrastructure. Each partner has found hundreds of critical vulnerabilities. Several report their bug-finding rate jumped more than tenfold.

Cloudflare flagged 2,000 bugs, with 400 of them rated high or critical severity. The company says the model's false positive rate beat human testers. Mozilla found and fixed 271 vulnerabilities in Firefox 150. That's more than ten times what Claude Opus 4.6 caught in Firefox 148.

Independent validation backs the claims

Outside reviewers support these numbers. The UK's AI Security Institute says the latest Mythos Preview checkpoint is the first model to fully solve both of its in-house cyber ranges. These are simulated multi-stage cyberattacks designed to test AI capabilities.

Independent security platform XBOW calls the model a major step beyond all prior models, citing "unprecedented precision." Anthropic says Mythos Preview also tops the academic benchmarks ExploitBench and ExploitGym. GPT-5.5 comes close in most benchmarks and is already openly available.

Patch volumes are surging

The impact is showing up in software updates. Palo Alto Networks shipped five times as many patches as usual in its latest release. Microsoft said the number of new patches will "continue trending larger for some time." Oracle claims it's finding and fixing flaws several times faster than before.

Mythos Preview has proven useful beyond hunting bugs. At one partner bank, the model helped catch and block a fraudulent wire transfer worth over $1.5 million.

Open-source scanning finds 6,000+ potential flaws

Alongside partner work, Anthropic says it scanned more than 1,000 open-source projects. The company identified over 6,000 potential flaws. These projects underpin much of the software that businesses and individuals rely on daily.

The sheer volume creates a bottleneck. Security teams must verify each finding, determine its severity, coordinate with maintainers, and develop patches. All of this takes time. The AI has no such constraints.

The dangerous transition period

Anthropic warns of a dangerous transition period. AI models like Claude Mythos can detect vulnerabilities far faster than organizations can patch them. This creates a growing security gap.

If attackers gain access to similar AI capabilities, they could exploit vulnerabilities before defenders can respond. The 90-day disclosure window that has served the security industry for years may not survive contact with AI-accelerated discovery.

Organizations now face a choice. They can adopt AI-powered security tools and deal with the flood of findings. Or they can wait and hope their software isn't among the thousands of vulnerable systems being catalogued.

Flowchart by Anthropic showing the triage pipeline for open-source vulnerabilities: of 23,019 candidates discovered, 1,900 were reviewed by external security firms, 1,726 confirmed valid, and 467 reported to maintainers. Another 1,129 findings were sent directly to maintainers at their request without extra review. In total, 1,596 findings were reported to maintainers, 1,451 acknowledged, but only 97 actually patched and 88 given public security advisories. Counts as of May 22, 2026.

ℹ️

Logicity's Take

Frequently Asked Questions

What is Claude Mythos Preview?

Claude Mythos Preview is Anthropic's latest AI model, designed for advanced security analysis. It can identify vulnerabilities in software at a rate that exceeds human security researchers.

What is Project Glasswing?

Project Glasswing is Anthropic's initiative partnering with about 50 organizations to use Claude Mythos Preview for finding security vulnerabilities in critical infrastructure software.

How many vulnerabilities did Claude Mythos find?

Claude Mythos Preview identified more than 10,000 high or critical severity vulnerabilities in system-critical software within one month of Project Glasswing's launch.

Why is Anthropic concerned about AI-powered bug detection?

Anthropic warns that AI models now find vulnerabilities faster than organizations can verify and patch them. This creates a security gap that could be exploited if attackers gain access to similar AI capabilities.

How does Claude Mythos compare to GPT-5.5 in security benchmarks?

According to Anthropic, Claude Mythos Preview tops the ExploitBench and ExploitGym academic benchmarks, with GPT-5.5 being close in most benchmarks. GPT-5.5 is already openly available.

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

Jeff Bezos is closing a $10 billion funding round for Project Prometheus, an AI lab focused on physics-based AI for manufacturing and engineering. With a $38 billion valuation and backing from JPMorgan and BlackRock, this signals a major shift in enterprise AI investment toward industrial applications.

21 Apr 2026

AI & Machine Learning·7 min

Kimi K2.6 Open-Weight AI: 300 Agents at a Fraction of the Cost

Moonshot AI's Kimi K2.6 matches GPT-5.4 and Claude Opus 4.6 on coding benchmarks while running 300 parallel agents. For businesses locked into expensive API contracts, this open-weight model could slash AI infrastructure costs while delivering enterprise-grade automation.

20 Apr 2026

AI & Machine Learning·7 min

AI Vendor Lock-In Risk: Anthropic Suspensions Hit Fintech

A Latin American fintech lost access to 60+ Claude accounts overnight with no warning, exposing dangerous single-vendor dependencies. The incident offers critical lessons for any business building AI into core operations.

20 Apr 2026

Also Read

GPT-5.6 vs Grok 4.5 vs Claude: 12 AI models build 4 apps

Startups & Innovation·6 min