Tech giants Google DeepMind, Anthropic, and Microsoft are intensifying efforts to defend against 'indirect prompt injection attacks'. This emerging security flaw allows hackers to embed malicious instructions within data sources that AI systems access, such as documents or web pages. When the AI processes this compromised content, it can be manipulated into performing unintended actions, including data leaks, spreading misinformation, or executing malicious code.
Unlike direct prompt injection, where attackers directly input malicious prompts, indirect attacks exploit the AI's interaction with external data. The AI unknowingly treats the embedded commands as legitimate instructions, bypassing traditional security measures. This poses a significant risk to AI-powered applications, as it can lead to unauthorised access and privilege escalation.
To counter this threat, tech companies are developing multi-layered defences, including data governance strategies, content sanitisation, and real-time threat detection. These measures aim to distinguish between legitimate content and malicious instructions, ensuring the safe and reliable operation of AI systems.




