Mistral has launched Voxtral, its first open-source AI audio model, challenging proprietary systems from companies like Google and OpenAI. Voxtral aims to provide high-performance speech understanding at a lower cost, targeting developers seeking alternatives to closed ecosystems. The model family includes Voxtral Small, a 24-billion parameter model for production-scale applications, and Voxtral Mini, a 3-billion parameter variant for local use. A transcription-focused version, Voxtral Mini Transcribe, is also available.
Voxtral models offer advanced features beyond basic transcription, including the ability to process audio up to 30 minutes for transcription and 40 minutes for understanding, due to its 32,000-token context window. It supports automatic language detection and excels in languages like English, Spanish, French, German, and Hindi. Voxtral can answer questions and summarise content directly from audio, and supports function calling from voice commands. The models are available for download on Hugging Face and can be integrated via a simple API. Mistral also plans to integrate Voxtral into its Le Chat chatbot.