Karpathy Cools on Reinforcement Learning

Karpathy Cools on Reinforcement Learning

29 August 2025

Andrej Karpathy, a prominent AI researcher and former OpenAI scientist, has expressed reservations about the long-term potential of reinforcement learning (RL). While acknowledging the value of building environments for AI systems to interact and learn, Karpathy suggests that RL may not be the key to replicating human-like intelligence. He believes humans don't primarily learn through reinforcement learning, and that reward functions are inherently suspicious.

Karpathy highlights the importance of creating diverse, high-quality environments for AI training and evaluation, similar to the focus on internet text during the pre-training era and conversations during supervised fine-tuning. He suggests that while RL may yield intermediate gains, new techniques will be necessary to achieve true human-level AI. Karpathy draws a distinction between RL's approach of adjusting action probabilities based on outcomes and the human ability to internalise learnings and develop 'second nature' skills.

Karpathy's perspective suggests a need to explore alternative learning paradigms beyond RL to unlock the full potential of AI. He remains optimistic about agentic interactions within environments, but believes that a different approach is required to replicate human intelligence.

AI generated content may differ from the original.

Published on 29 August 2025
aiopenaimachinelearningreinforcementlearningandrejkarpathy
  • AI firms share safety tests

    AI firms share safety tests

    Read more about AI firms share safety tests
  • AI Giants' Data Centre Race

    AI Giants' Data Centre Race

    Read more about AI Giants' Data Centre Race
  • DeepSeek V3.1 Model Unveiled

    DeepSeek V3.1 Model Unveiled

    Read more about DeepSeek V3.1 Model Unveiled
  • GPT-6: Memory is Key

    GPT-6: Memory is Key

    Read more about GPT-6: Memory is Key
Karpathy Cools on Reinforcement Learning