All posts
Trending Tech

Sakana Fugu claims to beat Claude: what the benchmarks show

Huma Shazia23 June 2026 at 6:02 pm5 min read
Sakana Fugu claims to beat Claude: what the benchmarks show

Key Takeaways

Sakana Fugu claims to beat Claude: what the benchmarks show
Source: Tech-Economic Times
  • Sakana Fugu orchestrates multiple AI models including GPT, Claude, and Gemini rather than being a standalone LLM
  • The system claims to match or exceed frontier models on benchmarks like SWE Bench Pro and GPQA-D
  • Pricing starts at $20/month for subscriptions or $5 per million input tokens for API access

Tokyo-based Sakana AI launched Fugu, a multi-agent orchestration system that claims to match or exceed leading frontier models on coding and reasoning benchmarks. The catch: Fugu isn't a foundation model. It's a coordinator that routes tasks to a hidden pool of AI models, including Claude Opus, GPT, and Gemini, then assembles their outputs.

The startup, valued at $2.65 billion after a $135 million Series B last November, argues that orchestrating multiple strong models beats any single model on complex tasks. That's a reasonable hypothesis. Whether Fugu's execution proves it is another question.

How does Sakana Fugu actually work?

Think of Fugu as a project manager that evaluates your prompt, breaks it into subtasks, and assigns each to the best available model. Instead of a fixed workflow, the system dynamically selects and coordinates specialized AI models. Users interact with a single OpenAI-compatible API while Fugu handles model routing in the background.

The architecture draws from two research papers, TRINITY and Conductor, presented at ICLR 2026. Sakana claims Fugu learns how to assemble expert agents and coordinate collaboration patterns rather than following predefined structures.

Two versions exist. Standard Fugu balances performance and latency for everyday coding, chatbots, and research. Fugu Ultra prioritizes answer quality with a larger agent pool, targeting workloads like cybersecurity analysis and patent research.

What do the benchmarks show?

Sakana claims Fugu Ultra matches or exceeds frontier models on SWE Bench Pro, LiveCodeBench, GPQA-D, and Humanity's Last Exam. These are legitimate, difficult benchmarks. SWE Bench Pro tests real-world software engineering tasks. GPQA-D measures graduate-level scientific reasoning.

Here's the asterisk: Fugu's performance depends entirely on the models in its agent pool. If it's routing queries to Claude Opus and GPT-4, it inherits their capabilities. The system adds value through smart orchestration, not through novel model weights.

Sakana explicitly states users cannot see which underlying models processed their queries. Model selection and routing are proprietary. That opacity makes independent verification harder.

Who built this system?

The pedigree is impressive. Sakana AI was founded in 2023 by David Ha, formerly of Google Brain, and Llion Jones, a co-author of the foundational 2017 paper "Attention Is All You Need" that introduced the transformer architecture powering every modern LLM.

The company name means "fish" in Japanese. Fugu refers to pufferfish. Their philosophy: nature favors the small and efficient over the large and wasteful. Instead of building ever-larger monolithic models, they're betting that coordinated specialists outperform generalists.

What does Fugu cost?

Subscriptions range from $20 to $200 per month. For API access, Fugu Ultra starts at $5 per million input tokens and $30 per million output tokens. That's competitive with direct access to frontier models, though you're paying for orchestration overhead on top of the underlying model costs.

Standard Fugu lets users configure agent participation for privacy and compliance. Fugu Ultra uses a fixed agent pool, presumably to guarantee maximum performance.

Is orchestration the future of AI systems?

Multi-agent systems aren't new. Researchers have explored ensemble approaches for years. What's different now: frontier models are good enough that routing tasks to the right specialist genuinely adds value. A coding model handles code. A reasoning model handles logic. A coordinator stitches it together.

The risk is dependency. If OpenAI or Anthropic changes API terms, pricing, or capabilities, Fugu's performance shifts with it. Sakana doesn't control the models it orchestrates.

The opportunity is real. Most enterprise AI workloads involve multiple steps. A system that intelligently routes each step to the best available model could deliver better results than any single model. Whether Fugu's orchestration is sophisticated enough to justify the premium remains to be proven at scale.

Frequently Asked Questions

Is Sakana Fugu a new AI foundation model?

No. Fugu is a multi-agent orchestration system that coordinates existing frontier models like Claude, GPT, and Gemini. It adds value through intelligent task routing rather than novel model training.

Which AI models does Sakana Fugu use?

Sakana does not disclose the specific models in Fugu's agent pool. The company confirms it includes frontier models like GPT, Claude Opus, and Gemini, but model selection is proprietary.

How much does Sakana Fugu cost?

Subscriptions range from $20 to $200 per month. API pricing for Fugu Ultra starts at $5 per million input tokens and $30 per million output tokens.

Does Fugu outperform Claude and GPT?

Sakana claims Fugu Ultra matches or exceeds frontier models on several benchmarks. However, Fugu's performance depends on the underlying models it orchestrates, so the comparison is indirect.

ℹ️

Logicity's Take

Fugu is a smart bet on a real limitation: no single model excels at everything. The orchestration approach could deliver genuine value for complex enterprise workloads. But the opacity around model selection creates verification problems, and the dependency on third-party APIs introduces strategic risk. If this works, expect Anthropic, OpenAI, and Google to build their own orchestration layers.

Also Read
Why a local AI model beats Claude and Gemini for home automation

Explores when orchestrated frontier models aren't the right choice

ℹ️

Need Help Implementing This?

Evaluating multi-agent AI systems for your enterprise? Our consulting team helps CTOs benchmark orchestration platforms against direct model access. Contact us to discuss your AI infrastructure strategy.

Source: Tech-Economic Times / ET

H

Huma Shazia

Senior AI & Tech Writer

Related Articles

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself
Trending Tech·8 min

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself

The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.