WhichLLM Ranks Local LLMs by Performance

What happened

Andyyyy64 released whichllm, an open-source command-line tool that automatically detects local hardware (GPU, CPU, RAM) and ranks suitable large language models (LLMs) from HuggingFace. The tool prioritises models based on real, recency-aware benchmarks, including LiveBench, Artificial Analysis, Aider, Chatbot Arena ELO, and Open LLM Leaderboard scores, rather than solely VRAM fit or parameter count. It provides architecture-aware estimates for VRAM usage and speed, supports NVIDIA, AMD, Apple Silicon, and CPU-only systems, and offers features like GPU simulation and one-command chat for immediate model interaction.

Why it matters

Selecting optimal local LLMs for deployment becomes more efficient for platform engineers and architects. The tool's focus on evidence-based, recency-aware performance metrics, rather than just model size, addresses the challenge of identifying truly performant models amidst a rapidly evolving ecosystem. This mechanism reduces guesswork in hardware planning and model selection, providing concrete data on VRAM fit, speed, and benchmark quality. Procurement teams can use GPU simulation to validate hardware purchases against specific model requirements, ensuring investments align with actual performance needs.

WhichLLM Ranks Local LLMs by Performance

What happened

Why it matters

Related articles.

LLMs Benchmarked in Production

Ollama Criticised for Performance, Attribution

Conway Improves Local LLM Performance

OpenAI and NVIDIA Collaborate