Anthropic has equipped its Claude Opus 4 and 4.1 AI models with the ability to end conversations that are deemed persistently harmful or abusive. This new feature is part of Anthropic's research into AI model welfare and is designed to protect the AI from toxic prompts and potential performance degradation. The models will only end a conversation as a last resort, after multiple attempts to redirect the user have failed.
When Claude ends a chat, the user will not be able to send any new messages in that conversation, but they can immediately start a new one. Anthropic has stated that this feature will be reserved for extreme edge cases, such as requests for sexual content involving minors or attempts to solicit information that would enable large-scale violence. Most users are unlikely to experience Claude cutting a conversation short, even when discussing highly controversial topics.
This move is part of Anthropic's broader research program that studies the idea of AI welfare. While the idea of anthropomorphising AI models remains an ongoing debate, the company believes that the ability to exit a potentially distressing interaction is a low-cost way to manage risks for AI welfare.