What happened
Google released its open-source Gemma 4 model family for native, offline inference on iPhones via the Google AI Edge Gallery app. The 31B variant, comparable to Qwen 3.5's 27B model, is a larger model in the Gemma family, while smaller E2B and E4B variants are specifically optimised for mobile efficiency and are the primary focus for direct on-device inference on iPhones. This deployment routes inference through the iPhone’s GPU for low-latency responses, enabling full local inference without API calls or cloud dependency. The Edge Gallery also bundles image recognition, voice interaction, and an extensible Skills framework.
Why it matters
On-device AI deployment is now commercially viable on consumer hardware, shifting the calculus for enterprise applications. Low-latency, offline inference, powered by iPhone GPUs, removes cloud dependencies and addresses data privacy constraints critical for field operations and healthcare settings. Procurement teams and security architects must now account for well-defined, local AI capabilities, preparing for a future where sensitive data processing occurs entirely on-device. This establishes a foundation for new classes of secure, disconnected AI workflows.
Subscribe for Weekly Updates
Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.




