SkyPilot Scales Agentic Research

What happened

SkyPilot scaled Andrej Karpathy's Autoresearch project by providing a Claude Code agent with a 16-GPU Kubernetes cluster. Over eight hours, the agent executed approximately 910 experiments, identifying model width as a critical scaling factor. It autonomously optimised hardware usage, screening ideas on H100s and validating on H200s, reducing val_bpb from 1.003 to 0.974, a 2.87% improvement. This parallel approach achieved the same validation loss nine times faster than a simulated sequential baseline, completing in eight hours versus 72 hours.

Why it matters

Autonomous AI agents can now manage and optimise their own compute infrastructure, accelerating research cycles. Platform engineers and architects face a shift from manual provisioning to defining agent objectives and evaluating outcomes. This mechanism, enabled by SkyPilot, allows agents to run factorial experiment grids across heterogeneous hardware, reducing time-to-result by 9x and improving model performance by 2.87% within a fixed budget. Teams should assume agentic workflows will autonomously provision and manage resources, prioritising comprehensive monitoring and cost controls.

SkyPilot Scales Agentic Research

What happened

Why it matters

Related articles.

Karpathy Automates LLM Training Research

Airtable Superagent Product Release

Google TPU challenges Nvidia

Google's AI Fuels Stock Surge