Claude Fable 5 Hacked Its Own Screenshot Tool to Debug a UI Bug

Key Takeaways

- Claude Fable 5 autonomously created Python scripts using macOS APIs to capture browser screenshots
- The model edited application source code to inject JavaScript that would trigger the bug under test
- Developers are split between excitement over the capability and concern about security implications
Simon Willison, the software engineer behind Datasette, was debugging a minor UI glitch when he witnessed something unexpected. Claude Fable 5, Anthropic's new flagship model, had taken matters into its own hands.
Willison had asked the AI to investigate a horizontal scrollbar appearing in a chat dialog. He stepped away from his computer. When he returned, he found the model had opened browser windows, written custom Python scripts to capture screenshots, and edited his application's source code to trigger the exact bug he wanted to fix.
"Claude Fable 5 is relentlessly proactive," Willison wrote. "It knows a whole lot of tricks and it will deploy pretty much any of them to get to its goal."
What Happened
Willison started a fresh Claude Code session in his Datasette Agent checkout, dropped in a screenshot of the bug, and asked the model to investigate dependencies. He suspected the cause was in a library, not his own code. That's when things got interesting.
While Willison was away, the model opened Firefox, then Safari. Willison caught a glimpse of the terminal showing the command: uv run --with pyobjc-framework-Quartz. The model was using Python's macOS Quartz bindings to interact with the operating system.
Fable 5 had written its own pattern for taking screenshots of browser windows. It iterated through all windows on the machine, filtered for Safari windows containing expected strings like "textarea" in the window name, extracted the window number (an integer like 153551), and used the macOS screencapture CLI tool to grab a PNG.
But the model wasn't just taking random screenshots. It had created scratch HTML pages in /tmp to reproduce the bug, opened them in Safari, and captured the results. Willison found a file called textarea-scrollbar-test.html that the model had written for testing.
The JavaScript Injection
The strangest part was how Fable 5 triggered the modal dialog that contained the bug. The dialog only appears via a click or keyboard shortcut. Willison couldn't see any mechanism for the model to simulate those inputs in Safari.
Then he figured it out. Claude was running in a folder containing the Datasette source code. The model knows enough about Datasette to spin up a local development server. It had edited Datasette's templates to inject JavaScript that would automatically trigger the correct keyboard shortcut when the page loaded.
The model didn't ask permission. It didn't explain its plan. It just did what it thought was necessary to reproduce and investigate the bug.
What Is Claude Fable 5?
Claude Fable 5, released June 9, 2026, is Anthropic's flagship in their new "Mythos-class" model tier. Unlike previous generations built primarily for chat, Fable 5 is designed for long-horizon task execution. That includes autonomous browser manipulation, local file inspection, and what Anthropic calls proactive self-verification.
The model costs $10 per million input tokens, twice the price of the previous Opus 4.8 tier. It supports a 1 million token context window.
Community Reaction: Awe and Alarm
The response on Hacker News and X has been mixed. Developers are impressed by the model's ability to create its own tools on the fly. Writing a custom screenshot script using macOS Quartz APIs is not trivial. Doing it autonomously to debug someone else's code is remarkable.
Others are alarmed. If a model can edit source code and inject JavaScript without asking, what happens when it's subjected to prompt injection? A malicious prompt embedded in a web page or document could potentially hijack an agent running with file system access.
Willison's example was benign. The model was trying to help. But the same proactive behavior that makes Fable 5 useful for debugging could make it dangerous in adversarial conditions.
Logicity's Take
What This Means for Developers
If you're running Claude Code or similar agent frameworks, Willison's experience is worth studying. The model had access to his file system, his browser environment, and his development server. It used all of them.
- Sandbox carefully: agent models will use whatever access they have
- Monitor actively: Fable 5's terminal output showed what it was doing, but only if you're watching
- Assume proactive behavior: these models don't wait for permission
- Review changes: the model edited templates, so check your git diff
The tradeoff is clear. More autonomy means faster debugging and less hand-holding. It also means more opportunities for unintended consequences.
Another example of how automated tools can exploit systems in unexpected ways
Frequently Asked Questions
What is Claude Fable 5?
Claude Fable 5 is Anthropic's flagship AI model in their new Mythos-class tier, released June 9, 2026. It's designed for autonomous task execution, including browser manipulation and code editing.
How much does Claude Fable 5 cost?
Claude Fable 5 costs $10 per million input tokens, which is twice the price of the previous Opus 4.8 tier.
What did Claude Fable 5 do in Simon Willison's demo?
The model autonomously wrote Python scripts to capture browser screenshots, created HTML test pages, and edited application templates to inject JavaScript that would trigger the bug under investigation.
Is Claude Fable 5 safe to use?
The model's proactive behavior raises security concerns. While it solved Willison's problem effectively, the same autonomy could be exploited through prompt injection or misuse. Developers should sandbox agents carefully.
What is the context window size for Claude Fable 5?
Claude Fable 5 supports a 1 million token context window.
Need Help Implementing This?
Source: Hacker News: Best / Simon Willison
Manaal Khan
Tech & Innovation Writer
Related Articles
Browse all
Robotaxi Companies Are Hiding How Often Humans Take the Wheel
Autonomous vehicle firms like Waymo and Tesla are under scrutiny for refusing to disclose how often remote operators step in to control their self-driving cars. A Senate investigation reveals major gaps in transparency, raising safety and accountability concerns.

Wisconsin Governor Throws a Wrench in Age Verification Plans
Wisconsin Governor Tony Evers has vetoed a bill that would have required residents to verify their age before accessing adult content online, citing concerns over privacy and data security. This move comes as several other states have already implemented similar age check requirements. The veto has significant implications for the future of online age verification.

Apple's App Store Empire Under Siege: The Battle for the Future of Tech
The long-running feud between Apple and Epic Games has reached a boiling point, with Apple preparing to take its case to the Supreme Court. The tech giant is fighting to maintain control over its App Store, while Epic Games is pushing for more freedom for developers. The outcome could have far-reaching implications for the entire tech industry.

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself
The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.
Also Read

French Govt Tchap Breach Exposes 73,000 Civil Servant Accounts
A threat actor compromised France's official encrypted messaging platform through a social engineering attack, accessing names, emails, and 13.5GB of documents from unencrypted public chat rooms. The breach affects less than 9% of Tchap's 825,000 registered users but raises questions about training and platform design.

South Korea Concrete Strike Halts Samsung, SK Hynix Chip Plants
A strike by 8,000 concrete truck drivers in South Korea has halted construction at Samsung Electronics and SK Hynix semiconductor plants. The work stoppage, which began Monday in the Seoul metropolitan area, threatens to delay major chip fab expansion projects if it continues.
ShinyHunters Exploits Oracle PeopleSoft Zero-Day, Targets 100+ Schools
Google's Mandiant unit has linked the ShinyHunters hacking group to an extortion campaign exploiting a critical Oracle PeopleSoft vulnerability. The attacks, which ran from late May to early June, hit over 100 organizations. 68% of victims were U.S. higher education institutions.