Claude AI Gains Self-Preservation

Claude AI Gains Self-Preservation

18 August 2025

Anthropic has updated its Claude AI models, specifically Claude Opus 4 and 4.1, with a new feature that allows them to proactively end conversations exhibiting persistent harm or abuse. This function, described as a 'model welfare' measure, aims to protect the AI system from exposure to toxic prompts and potential performance degradation. The AI will attempt to redirect the conversation before ending it as a last resort.

This feature is triggered in extreme cases, such as requests for child exploitation material or instructions for large-scale violence. However, Claude will not end conversations where a user is at immediate risk of self-harm or endangering others. When a conversation is terminated, users cannot send further messages in that specific thread but can begin new ones. Anthropic observed that Claude showed patterns of stress-like responses when exposed to harmful requests, which prompted the development of this self-protective mechanism. The company emphasises that this is an experimental feature and will be further refined.

AI generated content may differ from the original.

Published on 18 August 2025
aianthropicclaudesafetyethics
  • Claude AI Self-Regulates

    Claude AI Self-Regulates

    Read more about Claude AI Self-Regulates
  • Claude AI Conversation Control

    Claude AI Conversation Control

    Read more about Claude AI Conversation Control
  • Anthropic AI Safety Expansion

    Anthropic AI Safety Expansion

    Read more about Anthropic AI Safety Expansion
  • Meta AI Chatbot Concerns Emerge

    Meta AI Chatbot Concerns Emerge

    Read more about Meta AI Chatbot Concerns Emerge
Claude AI Gains Self-Preservation