AI Model Outperforms ER Doctors in Harvard Emergency Triage Study

Study in Science found OpenAI's o1 reasoning model exceeded physician accuracy on ER triage and diagnoses, prompting calls for clinical trials and oversight.

Overview

A summary of the key points of this story verified across multiple sources.

1.

A study by researchers at Harvard Medical School and Beth Israel Deaconess Medical Center published in the journal Science found OpenAI's o1 reasoning model matched or outperformed emergency physicians on diagnostic accuracy using electronic health records.

2.

The team tested o1 across six experiments combining standardized clinical cases and a real-world sample including 76 emergency patients, finding the model handled uncertainty and fragmented text data especially well at triage.

3.

Study authors and co-authors said the results do not justify replacing clinicians and urged rigorous evaluation, including randomized clinical trials and oversight, before deploying such models in clinical workflows.

4.

In triage the model identified an exact or very close diagnosis in about 67% of cases versus roughly 50% to 55% for two expert physicians, and its accuracy rose to about 81% to 82% versus 70% to 79% for humans as more data arrived.

5.

Researchers and independent experts called for testing safety, equity, multimodal integration and real-time trials to determine how AI might act as a supervised aid alongside clinicians rather than a replacement.

Written using shared reports from
5 sources
.
Report issue

Analysis

Compare how each side frames the story — including which facts they emphasize or leave out.

Center-leaning sources frame the story as a notable technical advance tempered by caution: headlines and lead data emphasize AI outperforming physicians, but editorial choices prioritize safety and limits by foregrounding caveats (e.g., "didn’t formally measure hallucination rate"), regulatory concerns, and independent experts warning about hallucinations and malicious behaviors.