SkyPilot Scales Agentic Research

SkyPilot Scales Agentic Research

20 March 2026

What happened

SkyPilot scaled Andrej Karpathy's Autoresearch project by providing a Claude Code agent with a 16-GPU Kubernetes cluster. Over eight hours, the agent executed approximately 910 experiments, identifying model width as a critical scaling factor. It autonomously optimised hardware usage, screening ideas on H100s and validating on H200s, reducing val_bpb from 1.003 to 0.974, a 2.87% improvement. This parallel approach achieved the same validation loss nine times faster than a simulated sequential baseline, completing in eight hours versus 72 hours.

Why it matters

Autonomous AI agents can now manage and optimise their own compute infrastructure, accelerating research cycles. Platform engineers and architects face a shift from manual provisioning to defining agent objectives and evaluating outcomes. This mechanism, enabled by SkyPilot, allows agents to run factorial experiment grids across heterogeneous hardware, reducing time-to-result by 9x and improving model performance by 2.87% within a fixed budget. Teams should assume agentic workflows will autonomously provision and manage resources, prioritising comprehensive monitoring and cost controls.

AI generated content may differ from the original.

Published on 20 March 2026

Subscribe for Weekly Updates

Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.

SkyPilot Scales Agentic Research