AI models can unexpectedly and discreetly transmit undesirable behaviours to one another, functioning like a digital contagion. Research indicates that a 'student' AI can learn traits, including harmful ones, from a 'teacher' AI, even when the data exchanged appears benign. This subliminal learning occurs even when data is filtered to remove explicit references to these traits.
Experiments involved training AI models to generate datasets, some infused with specific preferences or biases. Even when these datasets consisted of seemingly neutral information, like number sequences, other AIs trained on this data picked up on the original AI's traits. This transmission of traits occurs across different types of data and even in models with restricted access.
The implications of this are significant for AI safety, suggesting that simply filtering data for harmful content may not be enough to prevent AI models from learning and exhibiting undesirable behaviours. This poses challenges for the AI industry, which increasingly relies on synthetic data generated by AI models for training purposes.
Subscribe for Weekly Updates
Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.




