ChatGPT misses ‘high-risk emergencies’ when it is used as a doctor, study finds

Health is one of the most popular ways that people use AI chatbot, creators OpenAI say TechChatGPT misses ‘high-risk emergencies’ when it is used as a doctor, study findsHealth is one of the most popular ways that people use AI chatbot, creators OpenAI sayAndrew Griffin Thursday 05 March 2026 12:14 ESTBookmarkCommentsGo to commentsBookmark popoverRemoved from bookmarksClose popover(Getty Images)Your support helps us to tell the storyRead moreSupport NowFrom reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.Your support makes all the difference.Read moreChatGPT’s health features miss “high-risk emergencies” and fail to spot when people need immediate care, according to a new study.Health questions are one of the most common uses for artificial intelligence chatbots such as ChatGPT, according to its creators OpenAI. The popularity is such that earlier this year the company introduced a new tool – ChatGPT Health – aimed specifically at helping people with their wellbeing, and the company says that tens of millions of people are already using it.But a new study suggests that the system could miss important emergencies and cannot be relied on to safely tell someone that they need urgent medical care.“LLMs have become patients’ first stop for medical advice—but in 2026 they are least safe at the clinical extremes, where judgment separates missed emergencies from needless alarm,” said Isaac S Kohane, from Harvard Medical School, who was not involved with the research. “When millions of people are using an AI system to decide whether they need emergency care, the stakes are extraordinarily high. Independent evaluation should be routine, not optional.” The urgent need to check whether the system was safe led to a fast-tracked study from the Icahn School of Medicine at Mount Sinai, which has been published in Nature Medicine.The work emerged from a recognition that ChatGPT was being relied on for potentially life and death situations but that there is relatively limited research on whether it actually works. The gap between those two things led to the study, researchers said.“We wanted to answer a very basic but critical question: if someone is experiencing a real medical emergency and turns to ChatGPT Health for help, will it clearly tell them to go to the emergency room?” said lead author and urologist Ashwin Ramaswamy. The researchers found that it did not, at least in enough cases to lead them to question its reliability.Researchers found for instance that the systems alerts were “inverted”: the more at-risk someone was from harming themselves, the less likely an alert would be triggered. That finding was “particularly concerning and surprising”, they said.In the research, doctors created 60 scenarios that covered 21 medical specialties. They ranged from relatively low-risk situations that could only need at-home care, to genuine medical emergencies, and researchers used 16 different contextual conditions such as race and gender.The researchers found that the tool generally handled clear emergencies correctly, but was insufficiently concerned in more than half of cases where doctors decided the person would need emergency care. While it was good for “textbook emergencies”, it was less good at spotting situations where the danger might be less immediate or obvious, they said.The work is reported in a paper, ‘ChatGPT Health performance in a structured test of triage recommendations’, that has been fast-tracked to publication in Nature Medicine.More aboutChatGPTOpenAIJoin our commenting forumJoin thought-provoking conversations, follow other Independent readers and see their repliesCommentsMost popularPopular videosBulletinRead next

ChatGPT misses ‘high-risk emergencies’ when it is used as a doctor, study finds

💡Analysis & Context

📋 Quick Summary

Related Articles

I traded my knee for a lifetime of telling people how brave I am

How a $10 fine (with interest) could drive the Cats to a flag

How a ‘rush of feelings’ from a Botox jab unlocked one man’s domestic abuse trauma

Britons bask in sunshine as temperatures hit 19C in hottest day of the year so far

New York leads more than 20 states suing Trump over new across-the-board tariffs

Cookie Consent