Inception Ships Mercury 2

What happened

Inception Labs released Mercury 2, a new large language model employing a diffusion architecture for parallel response generation, achieving 1,009 tokens per second on NVIDIA Blackwell GPUs. This model generates responses through parallel refinement rather than sequential decoding, Result: over five times faster generation compared to conventional autoregressive models. Mercury 2 offers a 128K context window, native tool use, and schema-aligned JSON output, priced at $0.25 per million input tokens and $0.75 per million output tokens.

Why it matters

This architectural shift redefines the speed-quality trade-off for production AI, enabling reasoning-grade quality within real-time latency budgets. For platform engineers and product teams, Mercury 2's speed allows for more complex agentic workflows and interactive applications like coding assistants and voice interfaces, where compounding latency previously limited practical deployment. Model performance on NVIDIA GPUs demonstrates a new benchmark for real-time inference capabilities.

Inception Ships Mercury 2

What happened

Why it matters

Related articles.

Fractal Intros Cogentiq AI Platform

Wolfram Offers LLM Foundation Tool

Taalas Hard-Wires LLMs

Beyond LLMs: AI Evolution