Moonshot's Kimi K2.7 Code Costs 12x Less Than GPT-5.5

Key Takeaways

- Kimi K2.7 Code costs $0.95 per million input tokens and $4.00 per million output tokens, up to 12x cheaper than frontier competitors
- The model scores 81.1 on MCPMark Verified, beating Claude Opus 4.8's 76.4 on this agent-focused benchmark
- A Mixture-of-Experts architecture uses only 32 billion of its 1 trillion parameters per token, enabling efficiency gains
Moonshot AI has released Kimi K2.7 Code, an open-weights model built for programming tasks and agent-based coding workflows. The model is available on Hugging Face and positions itself as a budget-friendly alternative to GPT-5.5 and Claude Opus 4.8.
The pricing difference is dramatic. Kimi K2.7 Code charges $0.95 per million input tokens and $4.00 per million output tokens. For teams running heavy coding workflows, that translates to cost savings of up to 12x compared to frontier Western models.
Benchmark Performance: Strong on Agents, Weaker on Pure Coding
Moonshot's new model improves on its predecessor K2.6 across the board. On the company's in-house Kimi Code Bench v2, performance jumped from 50.9 to 62.0. Program Bench scores rose from 48.3 to 53.6, and MLS Bench Lite climbed from 26.7 to 35.1.
But the model still trails GPT-5.5 and Claude Opus 4.8 on most standard coding benchmarks. GPT-5.5 scores 69.1 on Program Bench compared to K2.7 Code's 53.6. On Kimi Code Bench v2, GPT-5.5 hits 69.0 versus 62.0 for the Moonshot model.
Program Bench is a particularly demanding test. Agents must reproduce a program's behavior using only a compiled binary and its documentation. No source code access, no decompilation, no internet. It's a stress test for reasoning under constraints.

The picture changes on agent-focused benchmarks. K2.7 Code scores 76.0 on MCP Atlas (up from 69.4 on K2.6) and 81.1 on MCPMark Verified (up from 72.8). That MCPMark score is notable. It beats Claude Opus 4.8's 76.4 on a benchmark that tests AI agents across five real-world software environments: Notion, GitHub, file systems, Postgres databases, and browser automation via Playwright.
GPT-5.5 still leads on MCPMark Verified with 92.9. But for teams that can tolerate some performance gap in exchange for 12x cost savings, K2.7 Code's agent performance makes it a credible option.
Architecture: One Trillion Parameters, 32 Billion Active
Kimi K2.7 Code uses a Mixture-of-Experts (MoE) architecture. The model has one trillion total parameters, but only 32 billion are active per token. It draws from a pool of 384 experts, selecting eight per token. This design enables the model to maintain quality while reducing inference costs.
Context length is 256,000 tokens, enough for deep repository analysis. The model is multimodal and can process images and video alongside text.
“By optimizing the reasoning phase, we've reduced token overhead by 30%, making agent-based coding workflows viable at scale without the traditional cost barriers.”
— Moonshot AI Lead Engineer
That 30% reduction in reasoning tokens directly affects the bill for agentic tasks, where models often burn through tokens on internal chain-of-thought steps. For workflows that involve multi-file debugging or repository-scale refactoring, the savings compound quickly.
Developer Reception
Early community reaction has been positive. Discussions on HackerNews and Reddit's r/LocalLLaMA show enthusiasm for the model's 60.4% score on SWE-bench Verified. Developers are testing its ability to handle repository-scale refactoring via the Kimi Code CLI.
There's also existing commercial traction. Cursor, the coding tool provider, resells a modified version of the Kimi model. That suggests real-world validation beyond benchmark performance.
When to Use K2.7 Code (and When Not To)
Moonshot is clear about the model's scope. K2.7 Code is optimized for long-running, complex software engineering tasks. For general tasks outside coding, the company still recommends the earlier K2.6 model.
The tradeoff is straightforward. If your workload involves heavy agent-based coding, repository-scale changes, or MCP integrations, K2.7 Code offers strong performance at a fraction of frontier pricing. If you need the absolute best scores on pure coding benchmarks, GPT-5.5 remains the leader.
| Model | Program Bench | MCPMark Verified | Input Cost (per 1M tokens) |
|---|---|---|---|
| Kimi K2.7 Code | 53.6 | 81.1 | $0.95 |
| Claude Opus 4.8 | — | 76.4 | ~$11+ |
| GPT-5.5 | 69.1 | 92.9 | ~$11+ |
Logicity's Take
Context on the strategic implications of open-weights AI models from Chinese labs
Related coverage of non-Western AI model development
Frequently Asked Questions
How much does Kimi K2.7 Code cost compared to GPT-5.5?
Kimi K2.7 Code charges $0.95 per million input tokens and $4.00 per million output tokens. This is up to 12x cheaper than GPT-5.5 and Claude Opus 4.8.
Is Kimi K2.7 Code open source?
Yes, the model is available as open-weights on Hugging Face, allowing developers to download and run it locally or on their own infrastructure.
How does Kimi K2.7 Code perform on coding benchmarks?
It scores 53.6 on Program Bench and 62.0 on Kimi Code Bench v2, trailing GPT-5.5's 69.1 and 69.0 respectively. However, it beats Claude Opus 4.8 on MCPMark Verified with 81.1 vs 76.4.
What is the context window for Kimi K2.7 Code?
The model supports a 256,000 token context window, enabling deep repository analysis and long-running coding tasks.
Should I use Kimi K2.7 Code for non-coding tasks?
Moonshot AI recommends using the earlier K2.6 model for general tasks outside coding. K2.7 Code is optimized specifically for software engineering and agent workflows.
Need Help Implementing This?
Source: The Decoder / Matthias Bastian
Manaal Khan
Tech & Innovation Writer
Related Articles
Browse allZuckerberg's Superintelligence Lab Faces Setback
The first AI model from Zuckerberg's superintelligence lab has failed to impress compared to its rivals, sparking concerns about the lab's direction. We take a closer look at what happened and why it matters.

Muse Spark Launch Propels Meta AI App to Top 5
The recent launch of Muse Spark has significantly boosted the popularity of Meta AI app, pushing it into the top 5. We explore what this means for the AI landscape.

Meta's Muse Spark AI Model Lags Behind ChatGPT and Claude
Meta's Muse Spark AI model still can't outperform ChatGPT and Claude in key areas, despite its advancements. We explore what this means for the AI landscape.

Meta Launches Muse Spark AI To Challenge ChatGPT
Meta launches Muse Spark AI to challenge ChatGPT and Claude, we explore what this means for the AI landscape. Muse Spark AI is a significant development in the AI chatbot space.
Also Read

Wikipedia Seismograph Reveals When the World Started Caring
A free tool called Wikipedia Seismograph lets you track traffic spikes to any Wikipedia article, revealing exactly when global interest in a topic surged. Created by web researcher Tara Calishain, it turns Wikipedia's public pageview data into a historical map of public attention.

Tata iPhone Factory Faces Shutdown Over Farmland Water Pollution
India's Tamil Nadu Pollution Control Board has alleged that Tata Electronics' iPhone components factory in Hosur contaminated groundwater used by nearby farms. The regulator warned of a forced shutdown after five inspections found wastewater overflow from the plant. Tata claims independent testing shows full compliance with environmental norms.

Why Open Source AI Is a National Security Issue
A viral manifesto argues that AI has become 'civilizational infrastructure' too important to leave in the hands of a few closed labs. The document warns that depending on corporate APIs for intelligence creates a dangerous 'subscription economy for cognition' that threatens operational freedom.