EdgeaiLiveAppeal 7.01 min read

Commodore 64 Runs 25K-Parameter Transformer

21 April 2026By Pulse24 desk
← Back
Share →

What happened

A developer implemented a 25,000-parameter, 2-layer decoder-only transformer, named Soul Player C64, on an unmodified 1 MHz Commodore 64. The model, written in hand-coded 6502/6510 assembly, features multi-head causal self-attention, softmax, and RMSNorm, all using int8 quantization. It processes tokens at approximately 60 seconds per token and fits entirely on a floppy disk, with a key breakthrough involving a 14-bit shift for softmax score normalization to enable meaningful attention weights on 8-bit hardware.

Why it matters

Extreme efficiency for transformer architectures on severely constrained hardware is now demonstrable, pushing the boundaries of edge AI and embedded systems. For platform engineers and hardware architects, this highlights the potential for deploying sophisticated AI models in environments previously considered impossible, albeit with significant latency. This follows a broader industry trend towards efficient inference, as seen with Google's TurboQuant Algorithm, where model compression and optimisation are critical for expanding AI's reach beyond data centres.

Source · github.comAI-processed content may differ from the original.
Published 21 April 2026
Commodore 64 Runs 25K-Parameter Transformer