Study Finds AI Chatbots Misdiagnose Patients

What happened

Dr. Rebecca Payne and colleagues conducted a UK study with nearly 1,300 participants, revealing that individuals using large language model (LLM) chatbots for medical advice were less likely to identify correct conditions and no better at determining appropriate care pathways than a control group. While LLMs demonstrate strong medical knowledge in isolated tests, passing licensing exams, their real-world application faltered due to communication breakdowns between users and the systems, not a lack of inherent knowledge. The study highlights a significant gap between benchmark performance and practical efficacy in high-stakes healthcare scenarios.

Why it matters

Clinical decision-making for healthcare providers and policymakers must prioritise real-world performance over benchmark scores. This study demonstrates that LLMs, despite passing medical exams, introduce diagnostic risk when used by patients, primarily due to human-machine communication failures. Procurement teams and security architects should assume current agentic AI in patient-facing roles presents unacceptable risk, limiting deployment to supportive, information-organisation tasks like drafting clinical notes or summarising records.

Study Finds AI Chatbots Misdiagnose Patients

What happened

Why it matters

Related articles.

AI Chatbots Fail Patient Diagnoses

AI Chatbots Aid Patient Diagnostics

Microsoft Launches AI Diagnosis Tool

OpenAI, Anthropic Ship Health AI