Today, with nearly 90% of organizational data stored as digital documents, their efficient exploitation is a strategic challenge. To leverage their potential, Mistral AI is launching Mistral OCR, an optical character recognition API that sets a new standard in document understanding.
Optical Character Recognition (OCR) is a technology that converts scanned documents, images, or PDF files into text that can be processed by software. OCR analyzes the shapes of letters and symbols to transcribe them into digital data, making the information accessible, editable, and usable by computer systems.
Unlike traditional OCR solutions, Mistral OCR goes beyond mere text extraction. Its multimodal approach allows it to understand and extract tables, images, mathematical equations, and complex layouts such as LaTeX formatting. This capability makes it an ideal tool for AI systems processing diverse documents, such as presentations or scientific articles.
Model Performance
The performance of Mistral OCR has been evaluated against the best existing solutions. Benchmark results show that it surpasses its competitors in all key areas, as seen in the table below. On scanned documents, it achieves an accuracy of 98.96%, making it particularly effective for digitizing paper documents. It also confirms its reliability in multilingual processing with a score of 89.55%, reaching accuracy above 99% for multiple languages.
Model | Overall | Math | Multilingual | Scanned | Tables |
---|---|---|---|---|---|
Google Document AI | 83.42 | 80.29 | 86.42 | 92.77 | 78.16 |
Azure OCR | 89.52 | 85.72 | 87.52 | 94.65 | 89.52 |
Gemini-1.5-Flash-002 | 90.23 | 89.11 | 86.76 | 94.87 | 90.48 |
Gemini-1.5-Pro-002 | 89.92 | 88.48 | 86.33 | 96.15 | 89.71 |
Gemini-2.0-Flash-001 | 88.69 | 84.18 | 85.80 | 95.11 | 91.46 |
GPT-4o-2024-11-20 | 89.77 | 87.55 | 86.00 | 94.58 | 91.70 |
Mistral OCR 2503 | 94.89 | 94.29 | 89.55 | 98.96 | 96.12 |
One of the major strengths of Mistral OCR lies in its processing speed: it can handle up to 2,000 pages per minute on a single node. This efficiency allows companies to transform their vast document archives into usable knowledge bases in record time, especially since the API handles structured output formats (Markdown, JSON), easily usable by other computer systems.
Initial Use Cases
The versatility of Mistral OCR paves the way for various applications. According to Mistral AI, its beta version has been utilized in the following cases:
- Digitization of Scientific Research: Academic institutions have used Mistral OCR to convert scientific articles and journals into formats usable by AI engines;
- Cultural and Historical Heritage Preservation: Organizations have experimented with its use to digitize ancient manuscripts and other heritage documents;
- Customer Service Optimization: Companies have explored the possibility of converting manuals and documentation into indexed knowledge bases, thus reducing response times to customer inquiries;
- Transformation of Technical and Regulatory Literature: Companies across sectors (education, law, engineering) have tested Mistral OCR to structure data from presentations, technical reports, and regulatory documents.
Mistral OCR is already available on "La Plateforme" and will soon be accessible via the unicorn's cloud partners. Companies handling sensitive data can opt for on-premise deployment. It is also possible to try it for free on "Le Chat".