What happened
SkyPilot scaled Andrej Karpathy's Autoresearch project by providing a Claude Code agent with a 16-GPU Kubernetes cluster. Over eight hours, the agent executed approximately 910 experiments, identifying model width as a critical scaling factor. It autonomously optimised hardware usage, screening ideas on H100s and validating on H200s, reducing val_bpb from 1.003 to 0.974, a 2.87% improvement. This parallel approach achieved the same validation loss nine times faster than a simulated sequential baseline, completing in eight hours versus 72 hours.
Why it matters
Autonomous AI agents can now manage and optimise their own compute infrastructure, accelerating research cycles. Platform engineers and architects face a shift from manual provisioning to defining agent objectives and evaluating outcomes. This mechanism, enabled by SkyPilot, allows agents to run factorial experiment grids across heterogeneous hardware, reducing time-to-result by 9x and improving model performance by 2.87% within a fixed budget. Teams should assume agentic workflows will autonomously provision and manage resources, prioritising comprehensive monitoring and cost controls.
Subscribe for Weekly Updates
Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.




