Sakana AI Unveils RL Conductor

What happened

Sakana AI researchers introduced "RL Conductor," a 7-billion parameter language model trained via reinforcement learning to automatically orchestrate diverse worker LLMs. This conductor dynamically analyses inputs, distributes tasks, and coordinates among agents, generating customised natural language workflows. It achieved an average score of 77.27% across difficult reasoning and coding benchmarks, outperforming individual frontier models like GPT-5 and Claude Sonnet 4, and human-designed multi-agent pipelines. RL Conductor reduced token usage by 83%, averaging 1,820 tokens per question compared to 11,203 for Mixture-of-Agents. This technology forms the backbone of Fugu, Sakana AI’s commercial multi-agent orchestration service.

Why it matters

Automated LLM orchestration reduces operational costs and improves performance for complex AI tasks. Platform engineers and architects gain a mechanism to deploy multi-model solutions without rigid, hard-coded pipelines, as RL Conductor dynamically optimises model selection and workflow generation. This shifts the burden from manual workflow design to training an orchestrator. Procurement teams face reduced API call expenses, given the 83% token usage reduction demonstrated. This follows ENTERPILOT's unified AI gateway release in April, indicating a trend towards abstracted model access.

Sakana AI Unveils RL Conductor

What happened

Why it matters

Related articles.

RL Environments Gain Momentum

LLM Feedback Loop Design

Mbodi: AI Robot Trainer

Parallel targets AI agents