What happened
Andrej Karpathy released autoresearch, a GitHub project enabling AI agents to autonomously conduct LLM training research. The system allows agents to modify train.py—which contains the full GPT model, optimizer, and training loop—based on instructions in program.md, a human-edited Markdown file. These agents run experiments for fixed 5-minute durations on a single NVIDIA GPU, such as an H100, aiming to optimise the val_bpb (validation bits per byte) metric. This setup facilitates automated iteration and experimentation in LLM development, as detailed in the March 2026 repository release.
Why it matters
This development shifts the focus for platform engineers and research teams from direct code manipulation to programming agent instructions, impacting workflow and resource allocation for model optimisation. The fixed 5-minute training budget standardises experiment comparison across different architectural or hyperparameter changes, but limits direct comparability of results across varied compute platforms. This follows a trend of increasing agentic engineering, where AI agents take on more autonomous development tasks. Teams should assume agentic workflows will increasingly manage core development loops.
Subscribe for Weekly Updates
Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.




