What happened
Ollama updated its platform to version 0.19+, integrating Apple's MLX framework for faster Apple Silicon inference and NVIDIA's NVFP4 for memory-efficient accuracy. Released March 31, 2026, this update also improved caching for agentic workflows, reducing memory utilisation and accelerating responses. A new guide details running Google's Gemma 4 8B model on 16GB Apple Silicon Mac minis, consuming 9.6GB; the 26B model, requiring 17GB, degraded performance on 24GB systems.
Why it matters
Local execution of frontier models on consumer hardware is now practical for platform engineers and developers. Ollama's MLX integration and NVFP4 support reduce the hardware floor for self-hosted inference, lowering operational costs and improving response times for agentic applications, shifting compute from cloud to edge devices. Procurement teams evaluate lower-spec Apple Silicon devices for specific LLM workloads.
Subscribe for Weekly Updates
Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.




