AI Chatbots Fail Patient Diagnoses

What happened

A study led by Rebecca Payne found widely available large language model (LLM) chatbots failed to improve patient health decisions in real-world scenarios. Participants using chatbots were less likely to identify correct conditions or determine care-seeking locations. However, without human interaction, the same chatbots identified relevant conditions and suggested appropriate care, dramatically outperforming human-interacted results. This performance gap stemmed from communication failures: users missed diagnoses, provided incomplete information, or chatbots misinterpreted details.

Why it matters

Real-world performance data is critical for deploying AI in high-stakes healthcare settings. Current AI evaluations, often based on benchmarks or model-to-model interactions, do not reflect the complexities of human-machine communication. For healthcare providers and policymakers, AI's immediate role is supportive, such as summarising patient records or drafting clinical notes, rather than front-line diagnosis or patient triage. Medical practice requires human connection, tailored communication, and nuanced judgement, which current chatbots lack.

AI Chatbots Fail Patient Diagnoses

What happened

Why it matters

Related articles.

AI versus Human Interaction

AI Drives Healthcare Efficiency

AI's Medical Data Bias

AI Healthcare Product Launches