AI & Machine Learning

OpenAI's ChatGPT for Clinicians Beats Doctors on Medical Benchmark

Manaal Khan23 April 2026 at 4:08 pm4 min read

Key Takeaways

GPT-5.4 scored 59.0 on HealthBench Professional versus 43.7 for human doctors with unlimited time and internet access
The free tool is available to verified physicians, advanced-practice nurses, physician assistants, and pharmacists in the US
OpenAI built both the benchmark and the product being tested, which raises methodological concerns

OpenAI released ChatGPT for Clinicians this week, a free AI assistant built for everyday medical work. The company claims its GPT-5.4 model outperforms human doctors on clinical tasks by a wide margin, even when those doctors have unlimited time and full internet access.

The tool is now available to verified healthcare professionals in the United States. Physicians, nurses with advanced clinical qualifications, physician assistants, and pharmacists can access it at no cost.

What the Benchmark Shows

OpenAI published HealthBench Professional alongside the launch. The benchmark measures AI performance across three clinical areas: consultations, writing and documentation, and medical research. It uses doctor-written conversations, multi-level physician scoring, and targeted data filtering.

59.0 vs 43.7

GPT-5.4 score versus human doctors on HealthBench Professional, despite physicians having unlimited time and web access

GPT-5.4 running in the ChatGPT for Clinicians workspace scored 59.0 overall. Doctor-written responses came in at 43.7. Every other AI model tested scored below the Clinicians version: the base GPT-5.4 hit 48.1, Anthropic's Claude Opus 4.7 reached 47.0, Google's Gemini 3.1 Pro scored 43.8, and xAI's Grok 4.2 landed at 36.1.

HealthBench Professional scores show GPT-5.4 in the Clinicians workspace at 59.0, ahead of all competing models and human physicians.

The clinical workspace version scored about 11 points higher than base GPT-5.4. OpenAI did not clarify how much of that gap comes from the clinical setup versus how the benchmark was built.

A Tough Test by Design

OpenAI says the benchmark was designed to be difficult. About a third of the examples come from targeted "red teaming," where doctors actively tried to find weaknesses in the models. The hardest conversations were overrepresented by a factor of 3.5.

The benchmark builds on the earlier HealthBench and includes multi-level physician scoring. OpenAI reports that 99.6 percent of answers were rated reliable by evaluators.

The Methodology Problem

There's an obvious issue with these results. OpenAI built the benchmark and tested its own product. That's not unusual in AI research, but it means the numbers deserve scrutiny.

Benchmark scores also don't translate directly to real clinical practice. A model that excels at structured evaluation tasks might perform differently in the chaos of an emergency room or the nuance of a long-term patient relationship.

View on X

Discussion of the HealthBench Professional results on X

What the Tool Actually Does

ChatGPT for Clinicians includes features aimed at daily medical work. The system offers real-time clinical searches across specialist literature, templates for recurring workflows, and automatic recognition of continuing medical education credits.

The tool is currently limited to US healthcare professionals who can verify their credentials. OpenAI hasn't announced plans for international expansion.

Model	HealthBench Professional Score
GPT-5.4 (Clinicians workspace)	59.0
GPT-5.4 (base)	48.1
Claude Opus 4.7	47.0
Human doctors (unlimited time/internet)	43.7
Gemini 3.1 Pro	43.8
Grok 4.2	36.1

What This Means in Practice

The 15-point gap between AI and human doctors looks striking. But context matters. Doctors don't typically have unlimited time to answer questions. They juggle patients, paperwork, and interruptions. An AI that scores higher under test conditions might still serve best as a second opinion rather than a replacement.

The more interesting number might be the 11-point gap between the Clinicians workspace and base GPT-5.4. That suggests specialized tuning and medical-specific features add real value, which could shape how healthcare organizations think about deploying AI tools.

ℹ️

Logicity's Take

Frequently Asked Questions

Is ChatGPT for Clinicians free?

Yes. OpenAI offers it at no cost to verified physicians, advanced-practice nurses, physician assistants, and pharmacists in the United States.

How did GPT-5.4 compare to human doctors?

GPT-5.4 in the Clinicians workspace scored 59.0 on HealthBench Professional. Human doctors scored 43.7, despite having unlimited time and internet access during the test.

Which AI models were tested on HealthBench Professional?

OpenAI tested GPT-5.4 (base and Clinicians versions), Anthropic's Claude Opus 4.7, Google's Gemini 3.1 Pro, and xAI's Grok 4.2. The Clinicians version of GPT-5.4 scored highest.

Is ChatGPT for Clinicians available outside the US?

Not currently. OpenAI has only announced availability for verified US healthcare professionals and has not shared international expansion plans.

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

Also Read

Hacks & Workarounds·5 min

Why I Switched from Stirling-PDF to an Open Source Rival

Stirling-PDF was the perfect local PDF tool until silent failures started corrupting files. When merging overwrote originals and splitting dropped pages without warning, the trust broke. Here's what happened and what comes next.

Huma Shazia·23 Apr 2026

Hacks & Workarounds·5 min

How OnePlus One Changed Android Phones Forever

Twelve years ago, a startup nobody knew launched a $300 phone that made the entire Android industry rethink pricing. The OnePlus One arrived on April 23, 2014, and its ripple effects still shape how we buy smartphones today.

Manaal Khan·23 Apr 2026

Trending Tech·4 min

Honor 600 Pro Copies iPhone 17 Pro Design, Costs €300 Less

Honor's latest flagship phones borrow heavily from Apple's design playbook, right down to the orange colorway and triple camera layout. The 600 Pro undercuts the iPhone 17 Pro by several hundred euros while packing last year's top Qualcomm chip.