Inception Labs Ships Faster AI Model

What happened

Inception Labs launched Mercury 2, a diffusion-based AI model generating approximately 1,000 tokens per second, claiming it as the world's fastest reasoning language model. Mercury 2 scored 90% on the AIME 2026 benchmark and 77% on GPQA, outperforming Google's DiffusionGemma, which achieved 69.1% on AIME 2026 at similar speeds. Augment Code reported an 82% latency reduction and 90% cost cut by integrating Mercury 2 into its subagents, maintaining output quality. Mercury 2 is a paid, closed-weight API model, contrasting with Google's free, open-weight DiffusionGemma.

Why it matters

High-speed AI inference now significantly reduces operational costs and latency for critical applications. Platform engineers and procurement teams gain new options for optimising agentic workflows, as demonstrated by Augment Code's 82% latency reduction and 90% cost cut using Mercury 2. This performance, driven by parallel diffusion generation, enables more responsive AI systems. However, Mercury 2's closed-weight API model introduces vendor dependency for these gains, contrasting with Google's open-weight DiffusionGemma. Teams must weigh performance against ecosystem lock-in when adopting these new architectures.

Inception Labs Ships Faster AI Model

What happened

Why it matters

Related articles.

LLM Feedback Loop Design

Mixture-of-Recursions boosts LLM efficiency

AI Reasoning Gains Plateauing?

Qwen Unveils Efficient LLM