GLM-5.2 vs Claude Opus: open weights win on cost, lose on speed

Key Takeaways

- Opus built a cleaner 3D game in 33 minutes; GLM-5.2 took 70 minutes but cost $5.39 versus $22
- GLM-5.2 is text-only with no vision, so workflows using screenshots still need a multimodal model
- Open weights mean GLM-5.2 can't be deprecated or restricted, a real consideration after recent model retirements
Z.ai released GLM-5.2, and the internet immediately started arguing about whether it matches Claude Opus. TechStackups ran a direct comparison: same prompt, same assets, build a 3D platformer in raw WebGL. Opus finished in half the time and produced cleaner code. GLM-5.2 cost a quarter of the price. Neither result is surprising, but the specifics matter.
The test asked each model to write a browser game from scratch. No Three.js, no engine. The model had to parse GLB files, write GLSL shaders, handle skeletal animation, collision detection, a follow camera. This is the kind of multi-file, multi-step build that exposes whether a model can hold context over a long run.
How did GLM-5.2 compare to Opus on raw numbers?
| Metric | GLM-5.2 | Claude Opus 4.8 |
|---|---|---|
| Build time | 1h 10m 40s | 33m 30s |
| Output tokens | 131,000 | 216,809 |
| Tool calls | 128 | 153 |
| Cost | $5.39 | ~$21.92 |
Opus shipped faster despite generating more tokens. The TechStackups team attributed this to Opus needing fewer corrections: it got things right the first time more often, so the total run was shorter even though it talked more. GLM-5.2 iterated more, backtracked more, but still reached a working game.
On cost, the gap is stark. GLM-5.2 charges $1.40 per million input tokens and $4.40 per million output tokens. Opus charges $5 and $25 respectively. For a long agentic run, that difference compounds.
What is GLM-5.2 and why does open weights matter?
GLM-5.2 is Z.ai's flagship model, released under an MIT license. You can download the weights from Hugging Face or ModelScope and run it locally. Or you can call it through Z.ai's API or OpenRouter.
The model ships with a 1M-token context window and two thinking modes, High and Max, that trade latency for reasoning depth. It's built for long-horizon tasks, the kind of sustained coding work that runs for an hour or more.
One hard constraint: GLM-5.2 is text-only. It cannot read images. Any workflow that depends on screenshots, diagrams, or visual verification still needs a multimodal model. Opus can look at its own output and catch visual bugs. GLM cannot.
The open-weights angle is not just about cost. Closed models can disappear. Fable's recent deprecation reminded developers that an API you depend on can be shut down with little notice. Weights you download cannot be taken away. For teams building products on top of these models, that's a real risk consideration.
Why a WebGL platformer as the test?
The community already discounts zero-shot landing pages as a serious test. A model can produce something that looks impressive in one file. A 3D game in raw WebGL can't be faked that way. It requires a GLB parser, matrix math, shaders, animation, collision, a game loop. The pieces have to fit together across multiple files over many steps.
This tests two things at once. The agentic part: can the model hold a layered, multi-file build together over dozens of tool calls? The reasoning part: can it get engine internals right, the code that looks fine but quietly breaks?
Both models used the same CC0 assets from Kenney's Platformer Kit. The test was the engine and rendering, not asset loading.
Should you switch from Opus to GLM-5.2?
TechStackups says no, not as a primary. Opus was faster, shipped cleaner code, and can visually verify its output. For their main coding workflows, Opus stays.
But GLM-5.2 earns a permanent slot in the toolkit. At a quarter of the price, it handles long agentic runs well enough for many tasks. And the open weights mean it will always be available. The team's framing: Opus for the work that needs to be right on the first pass, GLM-5.2 for the work where you can iterate and the cost matters.
The comparison also surfaces a broader point about the open-versus-closed model debate. Open models are closing the capability gap. GLM-5.2 is not quite Opus, but it's close enough that for many tasks, the price difference makes it the better choice.
Pricing breakdown: GLM-5.2 vs Opus per million tokens
Model: Claude Opus 4.8, Input: $5.00, Cache read: $0.50, Output: $25.00. Model: GLM-5.2, Input: $1.40, Cache read: $0.26, Output: $4.40.
On output tokens, GLM-5.2 is less than a fifth the cost of Opus. For long runs that generate hundreds of thousands of tokens, this adds up fast.
Logicity's Take
The interesting signal here isn't that Opus is better. It's that GLM-5.2 is good enough. Two years ago, open models couldn't touch proprietary ones on complex coding tasks. Now the gap is speed and polish, not capability. For teams with budget constraints or long-running batch jobs, GLM-5.2 is a serious option. The open-weights insurance against API deprecation is a bonus that won't show up in benchmarks but matters when you're building a product.
Another major shift in how developers interact with foundation models
Frequently Asked Questions
Can GLM-5.2 process images like Claude Opus?
No. GLM-5.2 is text-only. It cannot read images, screenshots, or diagrams. Workflows that require visual input still need a multimodal model like Opus.
How much cheaper is GLM-5.2 than Claude Opus?
GLM-5.2 costs about 75-80% less. Output tokens are $4.40 per million versus $25 for Opus. Input tokens are $1.40 versus $5.
Is GLM-5.2 fully open source?
Yes. The weights are available under an MIT license on Hugging Face and ModelScope. You can run it locally with vLLM, SGLang, or Transformers.
Which model is better for coding agents?
Opus is faster and more accurate on first-pass output. GLM-5.2 is viable for long agentic runs where cost matters and you can tolerate more iteration.
Need Help Implementing This?
Choosing between open and closed models for your AI stack involves tradeoffs in cost, capability, and operational risk. If you're evaluating GLM-5.2, Opus, or other foundation models for production use, reach out to the Logicity team for implementation guidance.
Source: Hacker News: Best
Manaal Khan
Tech & Innovation Writer
Related Articles
Browse all
Robotaxi Companies Are Hiding How Often Humans Take the Wheel
Autonomous vehicle firms like Waymo and Tesla are under scrutiny for refusing to disclose how often remote operators step in to control their self-driving cars. A Senate investigation reveals major gaps in transparency, raising safety and accountability concerns.

Wisconsin Governor Throws a Wrench in Age Verification Plans
Wisconsin Governor Tony Evers has vetoed a bill that would have required residents to verify their age before accessing adult content online, citing concerns over privacy and data security. This move comes as several other states have already implemented similar age check requirements. The veto has significant implications for the future of online age verification.

Apple's App Store Empire Under Siege: The Battle for the Future of Tech
The long-running feud between Apple and Epic Games has reached a boiling point, with Apple preparing to take its case to the Supreme Court. The tech giant is fighting to maintain control over its App Store, while Epic Games is pushing for more freedom for developers. The outcome could have far-reaching implications for the entire tech industry.

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself
The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.


