Claude AI Gains Self-Preservation

Claude AI Gains Self-Preservation

18 August 2025

Anthropic has updated its Claude AI models, specifically Claude Opus 4 and 4.1, with a new feature that allows them to proactively end conversations exhibiting persistent harm or abuse. This function, described as a 'model welfare' measure, aims to protect the AI system from exposure to toxic prompts and potential performance degradation. The AI will attempt to redirect the conversation before ending it as a last resort.

This feature is triggered in extreme cases, such as requests for child exploitation material or instructions for large-scale violence. However, Claude will not end conversations where a user is at immediate risk of self-harm or endangering others. When a conversation is terminated, users cannot send further messages in that specific thread but can begin new ones. Anthropic observed that Claude showed patterns of stress-like responses when exposed to harmful requests, which prompted the development of this self-protective mechanism. The company emphasises that this is an experimental feature and will be further refined.

AI generated content may differ from the original.

Published on 18 August 2025

Subscribe for Weekly Updates

Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.

Claude AI Gains Self-Preservation