What happened
Google released Gemma 4, an open large language model family, including a 26B-A4B Mixture-of-Experts (MoE) variant. This MoE model activates only 4 billion of its 26 billion parameters per pass, enabling local execution on consumer hardware at 51 tokens per second. It scores 82.6% on MMLU Pro and 88.3% on AIME 2026, offering 256K context, vision, and function calling. Concurrently, LM Studio 0.4.0 introduced a headless CLI (lms) and llmster daemon, enabling command-line model management, parallel request processing, and a stateful REST API for local inference.
Why it matters
Local inference capabilities for frontier models expand, reducing cloud API reliance for developers and security architects. Gemma 4's 26B-A4B MoE model delivers performance comparable to larger models with fewer active parameters, cutting hardware requirements and operational costs for on-device AI. LM Studio's new CLI and daemon streamline integration into CI/CD pipelines and headless server environments, preventing data egress and mitigating latency for sensitive workloads. This follows Ollama's recent efforts to accelerate Mac LLM inference.
Subscribe for Weekly Updates
Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.




