On January 30th, Mistral AI, the French GenAI unicorn, introduced Small 3, a 24 billion parameter LLM, demonstrating that to be effective, an LLM does not require an astronomical number of parameters. Small 3.1, its successor, maintains a compact architecture while introducing significant improvements in performance, multimodal understanding, and long context management, thus surpassing models like Google's Gemma 3-it 27B and OpenAI's GPT-4o Mini.

Like its predecessor, Small 3.1 has 24 billion parameters and can be deployed on accessible hardware configurations, such as a PC running a single RTX 4090 GPU or a Mac with 32 GB of RAM, allowing companies to retain control over their sensitive data without relying on a centralized cloud infrastructure. The inference speed remains the same: 150 tokens per second, ensuring minimal latency for applications requiring instant responses. True to its commitment to open source, Mistral AI offers both models under the Apache 2.0 license, enabling the community to use, refine, and deploy them for various use cases.

Source: Mistral AI

Performance Optimization

While Small 3.1 builds on Small 3, one of the major advancements is the expansion of the contextual window from 32,000 to 128,000 tokens, an essential asset for tasks involving reasoning over long text sequences. While Mistral Small 3 focused primarily on text, version 3.1 improves image and document interpretation, positioning it favorably against small proprietary models and opening the door to varied applications, from industrial quality control to document recognition and automatic medical image analysis.
Mistral Small 3.1 is available in two formats:
  • An instructed version, Mistral Small 3.1 Instruct, ready to be used for conversational tasks and language understanding;
  • A pre-trained version, Mistral Small 3.1 Base, ideal for fine-tuning and specialization in specific domains (health, finance, legal, etc.).
 
The Instruct version is one of the best models in its category, outperforming its competitors on benchmarks requiring reasoning and contextual understanding. According to benchmarks shared by Mistral AI:
  • Small 3.1 Instruct shows better performance than Google's Gemma 3-it (27B) in textual, multimodal, and multilingual tasks;
  • It surpasses OpenAI's GPT-4o Mini on benchmarks like MMLU, HumanEval, and LongBench v2, notably thanks to its extended contextual window of 128,000 tokens;
  • It also outperforms Claude-3.5 Haiku in complex tasks involving long contexts and multimodal data;
  • It excels against Cohere Aya-Vision (32B) in multimodal benchmarks such as ChartQA and DocVQA, demonstrating advanced understanding of visual and textual data;
  • Small 3.1 shows high performance in multilingualism, surpassing its competitors in categories such as European and Asian languages.
Mistral Small 3.1 can be downloaded on the Huggingface platform and tested on Mistral AI's Platform. It is also available on Google Cloud Vertex AI and will be offered on NVIDIA NIM in the coming weeks.

To better understand

What is a LLM (Large Language Model) in terms of technology and functioning?

A LLM is an artificial intelligence model designed to understand and generate natural language. It consists of billions of parameters adjusted through training on large amounts of text to predict the next word in a sentence. LLMs are used for applications like automatic translation, text summarization, and conversational agents.

What is the Apache 2.0 license and why is it significant for open source projects?

The Apache 2.0 license is an open-source software license that allows users to make significant modifications and use the software for commercial or private purposes while granting patents. It is significant because it ensures contributions remain free and accessible, fostering innovation and the adoption of new technologies.