Abacus Noir Zero-Copy Wasm GPU

What happened

Abacus Noir demonstrated zero-copy GPU inference for WebAssembly (Wasm) modules on Apple Silicon, eliminating data transfer overhead. Researchers directly shared a Wasm module's linear memory with the GPU, using Apple's Unified Memory Architecture. The implementation used Wasmtime's MemoryCreator trait for custom memory and Metal's makeBuffer(bytesNoCopy:length:) to wrap it as a GPU buffer. This reduced memory overhead to 0.03 MB for a 16 MB region, compared to 16.78 MB for traditional copies, validated with matrix multiply and Llama 3.2 1B inference.

Why it matters

This development reduces memory footprint and latency for AI inference on Apple Silicon, directly impacting platform engineers and architects building on-device AI applications. Enabling Wasm modules to share memory with the GPU without copying cuts memory overhead from 16.78 MB to 0.03 MB for a 16 MB region. This allows more or larger models to reside in memory simultaneously. The mechanism, specific to Apple's Unified Memory Architecture, improves resource utilisation for memory-bound workloads like large language model KV caches, where memory efficiency dictates concurrent models or users.

Abacus Noir Zero-Copy Wasm GPU

What happened

Why it matters

Related articles.

RunAnywhere Ships macOS Local AI

Samsung Doubles AI Device Target

Modal Labs Targets $2.5B Valuation

Anthropic Withholds Mythos AI Release