Mistral's Leanstral 1.5 scores 100% on formal math benchmark

Huma ShaziaJuly 4, 2026 at 12:47 PM4 min read

Key Takeaways

Leanstral 1.5 achieves 100% on miniF2F and solves 587 of 672 PutnamBench problems
The model is fully open-source under Apache 2.0, available on Hugging Face with a free API
In practical testing, it caught 5 previously unknown bugs across 57 open-source repositories

Mistral AI released Leanstral 1.5, an open-source model built specifically for formal verification in Lean 4. The model scores 100% on miniF2F, a benchmark covering math problems from high school through Math Olympiad difficulty, and solves 587 of 672 problems on PutnamBench. It's available now under Apache 2.0 via Hugging Face and a free API.

Lean 4 is a programming language and interactive theorem prover originally developed by Leonardo de Moura at Microsoft Research. It lets mathematicians and developers write proofs that machines can verify without human error. That makes it valuable for cryptography, aerospace software, and financial systems where a single bug can be catastrophic.

How does Leanstral 1.5 perform on math benchmarks?

The numbers are striking. On miniF2F, which tests formal reasoning across a range of difficulty levels, Leanstral 1.5 hits a perfect score. PutnamBench pulls 672 problems from the William Lowell Putnam Mathematical Competition, one of the most prestigious undergraduate math contests in North America. Leanstral solves 587 of them, an 87.4% success rate.

On FATE-H and FATE-X, algebra benchmarks testing master's and doctoral-level tasks in group theory and ring theory, the model scores 87% and 34% respectively. Among open-source models, Leanstral 1.5 leads on PutnamBench, FATE-H, and FATE-X. Only the closed-source Aleph Prover beats it on PutnamBench.

Real bugs in real code

Mistral trained the model primarily for mathematics, but formal verification skills transfer to software. In a hands-on test, the company scanned 57 open-source repositories. Leanstral 1.5 caught five previously unknown bugs, including an overflow bug in the Rust library varinteger.

Finding zero-day bugs in production code isn't a party trick. It demonstrates that formal verification models can do practical security work. For teams maintaining critical infrastructure, a model that catches integer overflows before they ship could prevent real incidents.

Training approach and availability

Mistral combined mid-training, supervised fine-tuning, and reinforcement learning to build Leanstral 1.5. The company hasn't disclosed the base model or training data composition, but the Apache 2.0 license means anyone can use, modify, and deploy it commercially without restrictions.

The model is hosted on Hugging Face and accessible through a free API. That's a low barrier for teams who want to experiment with formal verification without committing infrastructure.

Where formal verification fits in AI development

Most AI models optimize for natural language fluency or code generation speed. Formal verification is different. A proof assistant doesn't accept "probably correct." Either the proof checks out, or it doesn't. That rigor makes Lean models useful for verifying smart contracts, safety-critical embedded systems, and mathematical research where errors compound.

Mistral's open-source approach matters here. Formal verification historically required deep expertise and proprietary tools. An Apache 2.0 model that performs at this level opens the field to smaller teams and independent researchers.

ℹ️

Logicity's Take

Leanstral 1.5 sits in a different lane than code completion tools like GitHub Copilot or Cursor. It's not trying to write your code faster. It's trying to prove your code correct. For AI builders working on anything where bugs create liability, this is worth experimenting with now. The free API lowers the cost of evaluation to zero. The real question: can teams integrate formal verification into existing CI/CD pipelines without slowing down releases? If Mistral or the community ships tooling that makes that workflow smooth, formal verification could move from niche academic practice to standard engineering discipline.

Frequently Asked Questions

What is Lean 4 and why does it matter for AI?

Lean 4 is a programming language and theorem prover that lets developers write proofs a computer can verify. AI models trained on Lean can automate formal verification of math and software, catching bugs that testing misses.

Is Leanstral 1.5 free to use commercially?

Yes. It's released under Apache 2.0, which allows unrestricted commercial use. The model is available on Hugging Face and through a free API.

How does Leanstral 1.5 compare to closed-source alternatives?

It leads all open-source models on PutnamBench, FATE-H, and FATE-X. Only the closed-source Aleph Prover outperforms it on PutnamBench.

Can Leanstral 1.5 find bugs in production code?

Mistral tested it on 57 open-source repositories and it found five previously unknown bugs, including an overflow bug in a Rust library.

Need Help Implementing This?

If you're exploring formal verification for your codebase or want help integrating Leanstral 1.5 into your workflow, reach out to Logicity's consulting team. We work with AI builders to evaluate and deploy models that fit their stack.

Source: The Decoder / Matthias Bastian

Also Read

OpenAI builds its first chip with Broadcom

AI Tools & Launches·5 min

Mistral's Leanstral 1.5 scores 100% on formal math benchmark

Key Takeaways

How does Leanstral 1.5 perform on math benchmarks?

Real bugs in real code

Training approach and availability

Where formal verification fits in AI development

Logicity's Take

Frequently Asked Questions

Need Help Implementing This?

Related Articles

ChatGPT in Corporate Communications: A $0 AI Detector Test

Bezos AI Lab Gets $10B: What Project Prometheus Means

Kimi K2.6 Open-Weight AI: 300 Agents at a Fraction of the Cost

AI Vendor Lock-In Risk: Anthropic Suspensions Hit Fintech

Also Read

OpenAI builds its first chip with Broadcom

Qualcomm buys Modular for $4B, acquires Mojo language

Mistral claims OCR 4 wins 72% of blind tests against rivals