Ollama Criticised for Performance, Attribution

Ollama Criticised for Performance, Attribution

16 April 2026

What happened

Ollama, a popular local LLM runner, faces criticism for obscuring its foundational reliance on llama.cpp, failing MIT licence compliance, and later forking to an inferior custom ggml backend. This mid-2025 backend reintroduced bugs and resulted in significantly slower performance; llama.cpp runs up to 1.8 times faster. Additionally, Ollama misleadingly labelled distilled models, like DeepSeek-R1-Distill-Qwen-32B, simply as "DeepSeek-R1" in its library, causing user confusion and reputational damage.

Why it matters

Deploying local LLMs with Ollama now carries increased operational costs and reduced reliability due to performance deficits and technical issues. Benchmarks show llama.cpp achieving 161 tokens per second versus Ollama's 89 tokens per second, with a 30-50% CPU performance gap and approximately 70% higher throughput for llama.cpp on models like Qwen-3 Coder 32B. Procurement teams and researchers also face challenges from misleading model naming, which creates confusion about actual model capabilities and can misrepresent performance expectations.

AI generated content may differ from the original.

Published on 16 April 2026

Subscribe for Weekly Updates

Stay ahead with our weekly AI and tech briefings, delivered every Tuesday.

Ollama Criticised for Performance, Attribution