Sakana Fugu claims to beat Claude: what the benchmarks show

Key Takeaways

- Sakana Fugu orchestrates multiple AI models including GPT, Claude, and Gemini rather than being a standalone LLM
- The system claims to match or exceed frontier models on benchmarks like SWE Bench Pro and GPQA-D
- Pricing starts at $20/month for subscriptions or $5 per million input tokens for API access
Tokyo-based Sakana AI launched Fugu, a multi-agent orchestration system that claims to match or exceed leading frontier models on coding and reasoning benchmarks. The catch: Fugu isn't a foundation model. It's a coordinator that routes tasks to a hidden pool of AI models, including Claude Opus, GPT, and Gemini, then assembles their outputs.
The startup, valued at $2.65 billion after a $135 million Series B last November, argues that orchestrating multiple strong models beats any single model on complex tasks. That's a reasonable hypothesis. Whether Fugu's execution proves it is another question.
How does Sakana Fugu actually work?
Think of Fugu as a project manager that evaluates your prompt, breaks it into subtasks, and assigns each to the best available model. Instead of a fixed workflow, the system dynamically selects and coordinates specialized AI models. Users interact with a single OpenAI-compatible API while Fugu handles model routing in the background.
The architecture draws from two research papers, TRINITY and Conductor, presented at ICLR 2026. Sakana claims Fugu learns how to assemble expert agents and coordinate collaboration patterns rather than following predefined structures.
Two versions exist. Standard Fugu balances performance and latency for everyday coding, chatbots, and research. Fugu Ultra prioritizes answer quality with a larger agent pool, targeting workloads like cybersecurity analysis and patent research.
What do the benchmarks show?
Sakana claims Fugu Ultra matches or exceeds frontier models on SWE Bench Pro, LiveCodeBench, GPQA-D, and Humanity's Last Exam. These are legitimate, difficult benchmarks. SWE Bench Pro tests real-world software engineering tasks. GPQA-D measures graduate-level scientific reasoning.
Here's the asterisk: Fugu's performance depends entirely on the models in its agent pool. If it's routing queries to Claude Opus and GPT-4, it inherits their capabilities. The system adds value through smart orchestration, not through novel model weights.
Sakana explicitly states users cannot see which underlying models processed their queries. Model selection and routing are proprietary. That opacity makes independent verification harder.
Who built this system?
The pedigree is impressive. Sakana AI was founded in 2023 by David Ha, formerly of Google Brain, and Llion Jones, a co-author of the foundational 2017 paper "Attention Is All You Need" that introduced the transformer architecture powering every modern LLM.
The company name means "fish" in Japanese. Fugu refers to pufferfish. Their philosophy: nature favors the small and efficient over the large and wasteful. Instead of building ever-larger monolithic models, they're betting that coordinated specialists outperform generalists.
What does Fugu cost?
Subscriptions range from $20 to $200 per month. For API access, Fugu Ultra starts at $5 per million input tokens and $30 per million output tokens. That's competitive with direct access to frontier models, though you're paying for orchestration overhead on top of the underlying model costs.
Standard Fugu lets users configure agent participation for privacy and compliance. Fugu Ultra uses a fixed agent pool, presumably to guarantee maximum performance.
Is orchestration the future of AI systems?
Multi-agent systems aren't new. Researchers have explored ensemble approaches for years. What's different now: frontier models are good enough that routing tasks to the right specialist genuinely adds value. A coding model handles code. A reasoning model handles logic. A coordinator stitches it together.
The risk is dependency. If OpenAI or Anthropic changes API terms, pricing, or capabilities, Fugu's performance shifts with it. Sakana doesn't control the models it orchestrates.
The opportunity is real. Most enterprise AI workloads involve multiple steps. A system that intelligently routes each step to the best available model could deliver better results than any single model. Whether Fugu's orchestration is sophisticated enough to justify the premium remains to be proven at scale.
Frequently Asked Questions
Is Sakana Fugu a new AI foundation model?
No. Fugu is a multi-agent orchestration system that coordinates existing frontier models like Claude, GPT, and Gemini. It adds value through intelligent task routing rather than novel model training.
Which AI models does Sakana Fugu use?
Sakana does not disclose the specific models in Fugu's agent pool. The company confirms it includes frontier models like GPT, Claude Opus, and Gemini, but model selection is proprietary.
How much does Sakana Fugu cost?
Subscriptions range from $20 to $200 per month. API pricing for Fugu Ultra starts at $5 per million input tokens and $30 per million output tokens.
Does Fugu outperform Claude and GPT?
Sakana claims Fugu Ultra matches or exceeds frontier models on several benchmarks. However, Fugu's performance depends on the underlying models it orchestrates, so the comparison is indirect.
Logicity's Take
Fugu is a smart bet on a real limitation: no single model excels at everything. The orchestration approach could deliver genuine value for complex enterprise workloads. But the opacity around model selection creates verification problems, and the dependency on third-party APIs introduces strategic risk. If this works, expect Anthropic, OpenAI, and Google to build their own orchestration layers.
Explores when orchestrated frontier models aren't the right choice
Need Help Implementing This?
Evaluating multi-agent AI systems for your enterprise? Our consulting team helps CTOs benchmark orchestration platforms against direct model access. Contact us to discuss your AI infrastructure strategy.
Source: Tech-Economic Times / ET
Huma Shazia
Senior AI & Tech Writer
Related Articles
Browse all
Robotaxi Companies Are Hiding How Often Humans Take the Wheel
Autonomous vehicle firms like Waymo and Tesla are under scrutiny for refusing to disclose how often remote operators step in to control their self-driving cars. A Senate investigation reveals major gaps in transparency, raising safety and accountability concerns.

Wisconsin Governor Throws a Wrench in Age Verification Plans
Wisconsin Governor Tony Evers has vetoed a bill that would have required residents to verify their age before accessing adult content online, citing concerns over privacy and data security. This move comes as several other states have already implemented similar age check requirements. The veto has significant implications for the future of online age verification.

Apple's App Store Empire Under Siege: The Battle for the Future of Tech
The long-running feud between Apple and Epic Games has reached a boiling point, with Apple preparing to take its case to the Supreme Court. The tech giant is fighting to maintain control over its App Store, while Epic Games is pushing for more freedom for developers. The outcome could have far-reaching implications for the entire tech industry.

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself
The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.


