The Six-Month Compression
Six months ago, running a serious AI model required data centre hardware. Most procurement and strategy teams have not caught up with what happened next.
Since then, the floor has dropped every month: a $600 consumer GPU ran 70-billion-parameter models in February, Google halved inference memory requirements and a full voice assistant ran locally on Mac in March, and by April frontier models were practical on everyday MacBooks and running natively on iPhones. Now ds4 extends the trend to a million-token context window — roughly equivalent to an entire codebase or a shelf of documents in a single query — on a laptop.