What happened
Qwen released the Qwen-Robot Suite, a collection of three foundation models—Qwen-RobotNav, Qwen-RobotManip, and Qwen-RobotWorld—designed to bridge the gap between vision-language understanding and physical robot control. Qwen-RobotNav unifies five navigation task families, achieving a 76.5% success rate (SR) on VLN-CE RxR and deploying zero-shot on a Unitree Go2 quadruped. Qwen-RobotManip processes over 38,100 hours of robot data across 15 embodiments, leading the RoboChallenge Table30 v1 generalist track with a 45% SR. Qwen-RobotWorld co-trains 20+ embodiments using an 8.6 million video-text pair corpus, ranking first on EWMBench and WorldModelBench.
Why it matters
Physical agents can now directly translate high-level language instructions into complex physical actions across diverse robot embodiments and environments. This suite addresses the previous bottleneck of heterogeneous robot data and incompatible action spaces by unifying them through language-first interfaces. Robotics developers and platform engineers gain tools to build more capable agentic systems, accelerating the deployment of embodied AI solutions. This follows Google DeepMind's recent release of Gemini Robotics-ER 1.6, indicating a broader industry focus on advanced robot control.




