Anthropic Alleges Model Distillation

What happened

Anthropic AI alleges "industrial-scale distillation attacks" on its Claude reasoning models by Chinese AI companies DeepSeek, Moonshot AI, and MiniMax. Anthropic stated these labs created over 24,000 fraudulent accounts and generated more than 16 million exchanges with Claude, extracting its capabilities to train and improve their own models. The company claims such illicit distillation can remove safeguards, potentially feeding model capabilities into military, intelligence, and surveillance systems, and notes these attacks are increasing in intensity and sophistication.

Why it matters

Model integrity and national security risks emerge from alleged illicit distillation, impacting security architects and national defence strategists. Anthropic claims the mechanism of illicit distillation can remove model safeguards, potentially diverting advanced AI capabilities into foreign military and intelligence systems. This activity highlights a critical constraint: current international frameworks lack specific laws governing such cross-border AI model extraction. The alleged scale, involving over 16 million exchanges, underscores the potential for significant capability transfer without consent or oversight.

Subscribe for Weekly Updates

Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.

Read the newsletter →

Listen to the podcast →