AI & Machine Learning

Moonshot's Kimi K2.7 Code Costs 12x Less Than GPT-5.5

Manaal Khan13 June 2026 at 2:42 pm5 min read

Key Takeaways

Kimi K2.7 Code costs $0.95 per million input tokens and $4.00 per million output tokens, up to 12x cheaper than frontier competitors
The model scores 81.1 on MCPMark Verified, beating Claude Opus 4.8's 76.4 on this agent-focused benchmark
A Mixture-of-Experts architecture uses only 32 billion of its 1 trillion parameters per token, enabling efficiency gains

Moonshot AI has released Kimi K2.7 Code, an open-weights model built for programming tasks and agent-based coding workflows. The model is available on Hugging Face and positions itself as a budget-friendly alternative to GPT-5.5 and Claude Opus 4.8.

The pricing difference is dramatic. Kimi K2.7 Code charges $0.95 per million input tokens and $4.00 per million output tokens. For teams running heavy coding workflows, that translates to cost savings of up to 12x compared to frontier Western models.

$0.95 per million input tokens

Kimi K2.7 Code's input pricing is up to 12x cheaper than GPT-5.5 and Claude Opus 4.8, making it attractive for high-volume agent workflows.

Benchmark Performance: Strong on Agents, Weaker on Pure Coding

Moonshot's new model improves on its predecessor K2.6 across the board. On the company's in-house Kimi Code Bench v2, performance jumped from 50.9 to 62.0. Program Bench scores rose from 48.3 to 53.6, and MLS Bench Lite climbed from 26.7 to 35.1.

But the model still trails GPT-5.5 and Claude Opus 4.8 on most standard coding benchmarks. GPT-5.5 scores 69.1 on Program Bench compared to K2.7 Code's 53.6. On Kimi Code Bench v2, GPT-5.5 hits 69.0 versus 62.0 for the Moonshot model.

Program Bench is a particularly demanding test. Agents must reproduce a program's behavior using only a compiled binary and its documentation. No source code access, no decompilation, no internet. It's a stress test for reasoning under constraints.

Kimi K2.7 Code benchmark comparison against GPT-5.5 and Claude Opus 4.8

The picture changes on agent-focused benchmarks. K2.7 Code scores 76.0 on MCP Atlas (up from 69.4 on K2.6) and 81.1 on MCPMark Verified (up from 72.8). That MCPMark score is notable. It beats Claude Opus 4.8's 76.4 on a benchmark that tests AI agents across five real-world software environments: Notion, GitHub, file systems, Postgres databases, and browser automation via Playwright.

GPT-5.5 still leads on MCPMark Verified with 92.9. But for teams that can tolerate some performance gap in exchange for 12x cost savings, K2.7 Code's agent performance makes it a credible option.

Architecture: One Trillion Parameters, 32 Billion Active

Kimi K2.7 Code uses a Mixture-of-Experts (MoE) architecture. The model has one trillion total parameters, but only 32 billion are active per token. It draws from a pool of 384 experts, selecting eight per token. This design enables the model to maintain quality while reducing inference costs.

Context length is 256,000 tokens, enough for deep repository analysis. The model is multimodal and can process images and video alongside text.

“By optimizing the reasoning phase, we've reduced token overhead by 30%, making agent-based coding workflows viable at scale without the traditional cost barriers.”

— Moonshot AI Lead Engineer

That 30% reduction in reasoning tokens directly affects the bill for agentic tasks, where models often burn through tokens on internal chain-of-thought steps. For workflows that involve multi-file debugging or repository-scale refactoring, the savings compound quickly.

Developer Reception

Early community reaction has been positive. Discussions on HackerNews and Reddit's r/LocalLLaMA show enthusiasm for the model's 60.4% score on SWE-bench Verified. Developers are testing its ability to handle repository-scale refactoring via the Kimi Code CLI.

There's also existing commercial traction. Cursor, the coding tool provider, resells a modified version of the Kimi model. That suggests real-world validation beyond benchmark performance.

When to Use K2.7 Code (and When Not To)

Moonshot is clear about the model's scope. K2.7 Code is optimized for long-running, complex software engineering tasks. For general tasks outside coding, the company still recommends the earlier K2.6 model.

The tradeoff is straightforward. If your workload involves heavy agent-based coding, repository-scale changes, or MCP integrations, K2.7 Code offers strong performance at a fraction of frontier pricing. If you need the absolute best scores on pure coding benchmarks, GPT-5.5 remains the leader.

Model	Program Bench	MCPMark Verified	Input Cost (per 1M tokens)
Kimi K2.7 Code	53.6	81.1	$0.95
Claude Opus 4.8	—	76.4	~$11+
GPT-5.5	69.1	92.9	~$11+

ℹ️

Logicity's Take

Frequently Asked Questions

How much does Kimi K2.7 Code cost compared to GPT-5.5?

Kimi K2.7 Code charges $0.95 per million input tokens and $4.00 per million output tokens. This is up to 12x cheaper than GPT-5.5 and Claude Opus 4.8.

Is Kimi K2.7 Code open source?

Yes, the model is available as open-weights on Hugging Face, allowing developers to download and run it locally or on their own infrastructure.

How does Kimi K2.7 Code perform on coding benchmarks?

It scores 53.6 on Program Bench and 62.0 on Kimi Code Bench v2, trailing GPT-5.5's 69.1 and 69.0 respectively. However, it beats Claude Opus 4.8 on MCPMark Verified with 81.1 vs 76.4.

What is the context window for Kimi K2.7 Code?

The model supports a 256,000 token context window, enabling deep repository analysis and long-running coding tasks.

Should I use Kimi K2.7 Code for non-coding tasks?

Moonshot AI recommends using the earlier K2.6 model for general tasks outside coding. K2.7 Code is optimized specifically for software engineering and agent workflows.

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

Also Read

Ai In Business·4 min

Wikipedia Seismograph Reveals When the World Started Caring

A free tool called Wikipedia Seismograph lets you track traffic spikes to any Wikipedia article, revealing exactly when global interest in a topic surged. Created by web researcher Tara Calishain, it turns Wikipedia's public pageview data into a historical map of public attention.

Huma Shazia·13 Jun 2026

Trending Tech·5 min

Tata iPhone Factory Faces Shutdown Over Farmland Water Pollution

India's Tamil Nadu Pollution Control Board has alleged that Tata Electronics' iPhone components factory in Hosur contaminated groundwater used by nearby farms. The regulator warned of a forced shutdown after five inspections found wastewater overflow from the plant. Tata claims independent testing shows full compliance with environmental norms.

Huma Shazia·13 Jun 2026

Trending Tech·5 min

Why Open Source AI Is a National Security Issue

A viral manifesto argues that AI has become 'civilizational infrastructure' too important to leave in the hands of a few closed labs. The document warns that depending on corporate APIs for intelligence creates a dangerous 'subscription economy for cognition' that threatens operational freedom.

Manaal Khan·13 Jun 2026