Trending Tech

Probably raises $9M to cut AI hallucinations with smaller models

Huma Shazia16 June 2026 at 6:57 pm4 min read

Key Takeaways

Probably raised $9M seed funding from Andreessen Horowitz to build AI tools that prevent hallucinations from reaching users
The company's validator system lets it run AI on models four generations weaker than frontier systems, cutting token costs significantly
Founder Peter Elias argues better harness engineering reduces the need for powerful models by eliminating ambiguity before inference

Probably, a new AI startup, has raised $9 million in seed funding from Andreessen Horowitz to tackle one of the most persistent problems in AI deployment: hallucinations. The company's approach bypasses brute-force model scaling in favor of deterministic validation systems that catch errors before they ever reach end users.

Founder Peter Elias says Probably aims to hit 99.99% accuracy, the kind of reliability standard that deterministic software achieves routinely but AI systems struggle to match. The bet is that the path to reliable AI isn't necessarily bigger models. It's better scaffolding around smaller ones.

How does Probably catch AI hallucinations?

The company's first product is a data science tool that answers questions from complex datasets. Each response includes a citation and an audit trail showing how the answer was generated. That much is table stakes for enterprise AI tools in 2026.

What's different is what Elias calls the "data science mech suit." The LLM's initial responses pass through a deterministic validator system that checks results against the actual dataset. Any answer that doesn't match gets bounced back. The model has been trained against this validator, so the entire pipeline optimizes for fast, accurate responses rather than just plausible-sounding ones.

"What we learned building this was that the better your harness engineering is, the weaker the model can be," Elias told TechCrunch. "If you can refine the context enough, the model does not have to work very hard to do the right thing. Basically, it's an exercise in reducing ambiguity."

Why smaller models matter for cost control

Probably's validator approach produces a practical side effect: the system runs on AI models that Elias describes as "four classes weaker than the frontier models." That's a significant gap. Frontier models today require data center infrastructure and rack up substantial token costs. Probably's tool can run on local hardware, a desktop computer rather than a server farm.

The timing matters. Token costs have been rising, and many enterprise customers are rethinking their AI budgets. A system that delivers comparable accuracy on cheaper infrastructure addresses a real pain point.

4 generations weaker

Probably's validator system lets it run on models four classes below frontier systems while maintaining accuracy

Elias argues the approach extends beyond data science. Accounting, medical services, and other precision-sensitive domains could benefit from the same validator architecture. The common thread is use cases where a 95% correct answer isn't good enough and errors carry real consequences.

Are big AI labs ignoring this problem?

Elias takes a pointed stance on why the major AI labs haven't pursued this direction. "I think it's really interesting that the big AI labs have not even attempted to do this," he said. "They're incentivized not to, because they make money the more times you have to correct the model."

That's a strong claim. OpenAI, Anthropic, and Google have all invested in reducing hallucinations through techniques like retrieval-augmented generation and chain-of-thought prompting. But Elias is pointing at something structural: if your business model depends on token volume, you don't necessarily benefit from systems that get answers right on the first try.

Whether that fully explains the gap is debatable. But there's an undeniable market opening for startups building accuracy-first tooling, particularly as enterprise adoption moves from experimentation to production.

What comes next for Probably?

The $9 million seed round gives Probably runway to expand beyond its initial data science tool. Elias has signaled interest in accounting and medical applications, both fields where regulatory requirements demand audit trails and error rates have real consequences.

The broader question is whether Probably's approach can scale to more open-ended tasks. Data science queries against structured datasets are a relatively constrained domain. Extending the same validator logic to freeform text generation or multi-step reasoning would require new architectures. The company hasn't detailed plans for those use cases.

Frequently Asked Questions

What is Probably AI and what does it do?

Probably is a startup that builds AI tools designed to prevent hallucinations and factual errors. Its first product is a data science tool that validates LLM responses against deterministic systems before showing results to users.

How much funding did Probably raise?

Probably raised $9 million in seed funding from Andreessen Horowitz in June 2026.

How does Probably reduce AI hallucinations?

The company uses a validator harness that checks LLM outputs against the actual dataset. Results that don't match get rejected. The model is trained against this validator, optimizing the whole system for accuracy.

Can Probably's approach work for other industries?

Founder Peter Elias says the same validator architecture could extend to accounting, medical services, and other precision-sensitive use cases where errors carry significant consequences.

ℹ️

Logicity's Take

Probably's bet inverts the conventional AI scaling logic. Instead of throwing more parameters at reliability, it treats accuracy as an engineering problem around the model rather than inside it. If the approach holds up in production, it offers a template for cost-conscious enterprises: pair modest models with tight validation and skip the frontier model premium entirely. The open question is whether validator harnesses can generalize beyond structured data queries to messier, real-world use cases.

ℹ️

Need Help Implementing This?

Logicity helps technology teams evaluate AI reliability tools and build validation pipelines for production systems. Contact our consulting team to discuss your accuracy requirements.

Source: TechCrunch / Russell Brandom

Wisconsin Governor Tony Evers has vetoed a bill that would have required residents to verify their age before accessing adult content online, citing concerns over privacy and data security. This move comes as several other states have already implemented similar age check requirements. The veto has significant implications for the future of online age verification.

7 Apr 2026

Trending Tech·10 min

Apple's App Store Empire Under Siege: The Battle for the Future of Tech

The long-running feud between Apple and Epic Games has reached a boiling point, with Apple preparing to take its case to the Supreme Court. The tech giant is fighting to maintain control over its App Store, while Epic Games is pushing for more freedom for developers. The outcome could have far-reaching implications for the entire tech industry.

7 Apr 2026

Trending Tech·8 min

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself

The US auto safety regulators have closed their investigation into Tesla's remote parking feature, but what does this mean for the future of autonomous driving? We dive into the details of the investigation and what it reveals about the technology. The National Highway Traffic Safety Administration found that crashes were rare and minor, but the investigation's closure doesn't necessarily mean the feature is completely safe.

7 Apr 2026

Also Read

Moon hides Venus on June 17 in rare daytime occultation

Science & Space·5 min

Probably raises $9M to cut AI hallucinations with smaller models

Key Takeaways

How does Probably catch AI hallucinations?

Why smaller models matter for cost control

Are big AI labs ignoring this problem?

What comes next for Probably?

Frequently Asked Questions

Logicity's Take

Need Help Implementing This?

Related Articles

Robotaxi Companies Are Hiding How Often Humans Take the Wheel

Wisconsin Governor Throws a Wrench in Age Verification Plans

Apple's App Store Empire Under Siege: The Battle for the Future of Tech

Tesla's Remote Parking Feature: The Investigation That Didn't Quite Park Itself

Also Read

Moon hides Venus on June 17 in rare daytime occultation

Wear OS 7 rolls out to Pixel Watch 2, 3, and 4

Android 17 arrives with agentic AI and new multitasking