OpenAI has significantly enhanced the security of its ChatGPT Agent through rigorous red teaming exercises. A coordinated effort involving 110 attacks and the implementation of seven exploit fixes has resulted in a claimed 95% security defence system.
The 'ChatGPT agent' integrates tools like a visual browser, a text-based browser, a terminal, and API access, enabling it to handle complex tasks autonomously within a secure, virtual environment. Users retain control, granting permissions for significant actions and able to interrupt tasks. The red teaming process involved external experts stress-testing safeguards in realistic scenarios, with biology-trained reviewers validating evaluation data. OpenAI is also launching a bug bounty program to identify real-world risks.
This initiative is part of OpenAI's broader effort to ensure AI safety, fairness, and ethical integrity. The company uses a multi-layered approach to safety, partnering with external entities to proactively identify and mitigate potential risks. The agent is rolling out to Pro, Plus, and Team users, with Enterprise and Education users gaining access in the coming weeks.
Related Articles
OpenAI Agentic AI Risks Emerge
Read more about OpenAI Agentic AI Risks Emerge →ChatGPT Adds 'Record Mode'
Read more about ChatGPT Adds 'Record Mode' →ChatGPT Unveils General Purpose Agent
Read more about ChatGPT Unveils General Purpose Agent →ChatGPT to Automate Tasks
Read more about ChatGPT to Automate Tasks →