OpenAI's o3 model dominated Elon Musk's Grok 4 in the final of the Kaggle AI Chess Exhibition Tournament, securing a clean 4-0 victory. The event, the first of its kind, tested general-purpose large language models (LLMs) in a strategic game setting. Eight models participated, including entries from Google, Anthropic, and others, all competing under standard chess rules.
Grok 4, initially a strong contender, faltered in the final, making tactical errors. Commentators noted blunders, including the loss of a bishop early in the first game. In contrast, OpenAI's o3 maintained a consistent and strategic approach throughout the tournament. While neither model is a dedicated chess engine, the competition highlighted the ability of LLMs to apply reasoning skills to complex environments.
Despite the loss, Elon Musk downplayed the result, stating that xAI had dedicated minimal effort to chess-specific training. The event underscored the variability in how LLMs handle structured tasks and served as a benchmark for evaluating the reasoning and problem-solving capabilities of general-purpose AI systems. Google's Gemini 2.5 Pro secured third place, defeating another OpenAI entry.