LLM: Large Language Models and Foundation Models

Tech family

Large language models (LLM) are neural networks trained on massive text corpora to predict the most likely continuation of a word sequence. Since 2022, they have been the driving force behind generative AI and the focal point of an unprecedented industrial race involving OpenAI, Anthropic, Google DeepMind, Meta, Mistral AI, Alibaba, DeepSeek, and around twenty other players. This page provides an overview of their architecture, the key players, the benchmark models in 2026, and the controversies surrounding them.

📰 Actualités récentes

Recent News

Large Language Models (LLM) continue to transform the landscape of artificial intelligence, establishing themselves as essential tools in various fields, from cybersecurity to medicine. Recently, DeepSeek unveiled an update to its R1 model, the DeepSeek-R1-0528, which enhances its reasoning, logic, and programming capabilities. This version, released on May 28, 2025, approaches the performance of flagship models from OpenAI and Google while reducing the hallucination rate, a recurring issue for LLMs. Meanwhile, Tencent introduced Hunyuan-T1, a reasoning model using an innovative hybrid architecture to compete with market leaders. These developments highlight a growing trend towards improving the reasoning capabilities of LLMs, a key element in their ability to integrate into complex and critical systems.

In the field of cybersecurity, LLMs demonstrate their potential by facilitating threat detection and analysis. A study from New York University highlights their ability to exploit large amounts of textual data to anticipate and respond to attacks, transforming cybersecurity into a more reactive and proactive sector. Models like SecureBERT, specialized in cybersecurity, show promising results, although their refinement remains a challenge for companies. This evolution towards specialized LLMs reflects a trend towards diversifying language model applications, addressing specific needs while improving their accuracy and reliability.

The enthusiasm for open-source LLMs also continues, with initiatives like those from the Allen Institute for AI, which launched Tülu 3 405B, a high-performance open-source model based on Llama 3.1. This model stands out for using reinforcement learning with verifiable rewards, improving its performance in complex tasks. Meanwhile, Mistral AI launched Mistral Small 3, a model optimized for latency, offering an open-source alternative to proprietary models. These initiatives reflect a desire to democratize access to LLMs while reducing inference costs, a crucial issue for expanding their adoption, especially in resource-constrained environments.

As large language models continue to develop, challenges remain, particularly in terms of inference cost and environmental impact. Microsoft recently introduced BitNet.cpp, an open-source framework that optimizes the inference of LLMs quantified to 1 bit, thereby reducing their carbon footprint. This innovation highlights the importance of sustainability in the evolution of LLMs, as model size and complexity continue to grow. Additionally, the integration of LLMs in fields such as medical diagnostics remains to be refined, with a study by UVA Health indicating that while LLMs may outperform doctors in certain tasks, their integration has not yet significantly improved overall diagnostic performance.

Complete guide

Architecture: From Transformer to Modern Models

The transformer architecture, from which all modern LLMs are derived, is built on two fundamental components. The first is the self-attention mechanism, which enables the model to compute, for each position in the text, a weighted combination of the representations of all other positions. This operation is inherently parallelizable, which explains why transformers have overtaken recurrent architectures (RNN, LSTM) that dominated NLP until 2017. The second component is the stacking of dozens of identical transformer layers (typically between 32 and 96 in state-of-the-art models), each refining the representation further.

Contemporary LLMs come in several architectural variants:

Dense models, where all parameters are activated at every inference step (historical GPT-4, Claude, Llama 3.1 405B);
Mixture of Experts (MoE) models, where only a few expert subnetworks are activated based on the token being processed, reducing inference cost for equivalent parameter counts (Mixtral, DeepSeek-V3, presumed GPT-4o);
Native multimodal models, which ingest and produce text, images, audio, and video within a unified representation space (Gemini, GPT-4o, Pixtral Large, Claude 3.5 Sonnet);
Reasoning models, which generate an explicit chain-of-thought before answering-DeepSeek-R1, OpenAI o1/o3, Tencent Hunyuan-T1, Gemini Thinking-at the cost of increased latency but with superior quality on tasks involving mathematics, logic, and programming.

Major Players in 2026

OpenAI remains the perceived market leader with ChatGPT, GPT-4o, GPT-4o mini, and the o1/o3 reasoning model family. The company, valued at several hundred billion dollars in 2026, is primarily funded by Microsoft and SoftBank. Its commercial strategy combines API access (pay-per-token), consumer products (ChatGPT Plus at $20/month), and enterprise solutions (ChatGPT Enterprise, Azure OpenAI Service). OpenAI has expanded its scope with OAI-SearchBot, its search crawler, and SearchGPT.

Anthropic, founded in 2021 by former OpenAI members including Dario and Daniela Amodei, has made safety its key differentiator. The Claude family (Haiku, Sonnet, Opus) is especially valued for writing, coding, and long-context reasoning. Anthropic is backed by Amazon, Google, and SoftBank. In May 2026, Anthropic confirmed it was renting a portion of xAI's Colossus 1 capacity for around $1.25 billion per month, illustrating the concentration of compute resources.

Google DeepMind has consolidated its AI activities under the Gemini brand since 2023. The Gemini family (Nano, Flash, Pro, Ultra, then Gemini 2.0 Flash in December 2024) is integrated into the search engine (AI Overviews) and the Workspace suite. Google benefits from a structural advantage through its control over training data (Web, YouTube, Books) and its TPU infrastructure.

Meta has bet on weights open with the Llama family (Llama 1 in February 2023, Llama 2 in July 2023, Llama 3 in April 2024, Llama 3.1 405B in July 2024). This strategy has democratized access to foundation models and fueled an ecosystem of derivative models (Vicuna, Tulu, sector-specific fine-tunes). However, Meta declined to sign the European GPAI code of conduct in July 2025 and temporarily suspended the release of Llama 3 multimodal in Europe.

Mistral AI, founded in Paris in April 2023 by Arthur Mensch, Guillaume Lample, and Timothée Lacroix, has established itself as the European champion. Its hybrid strategy combines open models (Mistral 7B, Mixtral 8x7B, Codestral Mamba, Mathstral, Ministral 3B/8B) and proprietary models (Mistral Large 2, Pixtral Large). Mistral has signed the GPAI code of conduct and forged strategic partnerships with NVIDIA (Mistral NeMo 12B), Dassault Systèmes, Capgemini, and SAP.

In China, Alibaba (Qwen family), Baidu (ERNIE 4.5, ERNIE X1), Tencent (Hunyuan-T1), and especially DeepSeek have caught up with and now challenge US labs. DeepSeek-V3 stunned the community in January 2025 with its quality at a training cost roughly 30 times lower than Western competitors. DeepSeek-R1, released soon after and updated in June 2025 (R1-0528), triggered a temporary drop in NVIDIA's stock price by questioning the premium on massive infrastructure.

Other players occupy specialized roles: xAI (Grok, Colossus infrastructure), Cohere (multilingual enterprise models, Aya 23), AI2 (Tülu 3 405B, fully open models), Aleph Alpha (Pharia-1-LLM for German), Black Forest Labs (FLUX-1 for text-to-image), LightOn (Paradigm for enterprise), Hugging Face (model hub, SmolLM2), OpenEuroLLM (open European consortium).

Reference Models in 2026

The landscape of leading LLMs in 2026 consists of roughly a dozen families, each with its own sizes and variants:

GPT-4o / GPT-4o mini (OpenAI) - native multimodal, reduced latency, 128k token context window. GPT-4o mini has become the economic standard for high-volume deployments.
o1 / o3 (OpenAI) - reasoning models with internal chain-of-thought, exceptionally strong in competitive mathematics (AIME, IMO) and programming (Codeforces).
Claude 3.5 Sonnet / Claude 3 Opus (Anthropic) - 200k context window, excellent for long-form writing and document reading.
Gemini 2.0 Flash / Gemini Ultra (Google DeepMind) - native multimodal, integrated with the Google ecosystem.
Llama 3.1 405B / Llama 3.3 (Meta) - leading open source dense models.
Mistral Large 2 / Pixtral Large (Mistral AI) - European, open weights for certain versions.
DeepSeek-V3 / DeepSeek-R1-0528 (DeepSeek) - Chinese open source, reasoning, spectacularly low training cost.
Qwen2.5 (Alibaba) - leading Chinese open source multilingual model, 1M token context window.
NOVA (Amazon) - proprietary foundation family announced in December 2024.
Phi-3 / Phi-3.5 (Microsoft) - efficient small models for edge deployment.
Hunyuan-T1 (Tencent) - Chinese reasoning model rivaling the state of the art.
Grok 3 (xAI) - real-time access to X, massive infrastructure.

Training and Costs

Training a state-of-the-art LLM requires vast resources. For GPT-4, public estimates mention a budget of around $100 million and several tens of thousands of H100 GPUs over three months. Llama 3.1 405B required 16,000 H100s and about 30 million GPU hours. Mistral Large 2 and Mixtral, at the other end of the European spectrum, were trained on significantly more modest budgets, demonstrating that competitive performance is achievable with careful data curation.

The scaling laws formalized by OpenAI and later refined by DeepMind (Chinchilla, 2022) long dictated the field: model quality improves predictably with the product of parameter count and training data volume, provided both are balanced. This equation was challenged at the end of 2024: ballooning budgets no longer translated into spectacular gains on open benchmarks, and focus shifted to other factors-data quality, post-training reasoning, agents, multimodality.

Compute infrastructure has become a major geopolitical issue. NVIDIA, with its near-monopoly on H100/H200/B200 GPUs, captures most of the value. US export controls restrict sales to China, prompting DeepSeek and Alibaba to optimize training for downgraded chips (H800). xAI built the Colossus data center in Memphis in 2024, integrating 100,000 H100s and then 200,000 H100/H200s in under a year-an industrial record.

Capabilities and Limitations

Modern LLMs master a wide array of tasks: writing, summarization, translation, code generation, dialogue, information extraction, classification, and processing both structured and unstructured documents. They have become standard components in numerous applications-conversational search engines, coding assistants (Copilot, Cursor), legal and medical agents, customer support systems, Office and Workspace productivity tools.

Their limitations are also well documented. LLMs hallucinate-they generate plausible but factually incorrect content-especially on niche topics, precise figures, and bibliographic references. They lack robustness against adversarial attacks (prompt injection, jailbreak), as shown by the EPFL study in December 2024 on LLM vulnerability to adaptive attacks. They can be manipulated to shift user opinions (EPFL study, April 2024). They consume significant amounts of energy and water-a growing concern for regulators and shareholders. Their biases reflect those of the training corpora, which are mainly English-language and North-Western in origin.

Open Source vs Proprietary

The divide between open and closed LLMs has structured the debate since 2023. Proponents of open models-Meta, Mistral, Hugging Face, AI2, DeepSeek, La Quadrature du Net-invoke technological sovereignty, the possibility of independent audits, academic dissemination, and industrial resilience. Opponents-Anthropic, OpenAI in certain respects-point to the risk of proliferating malicious uses (bio-terrorism, large-scale disinformation, fraud) and the impossibility of withdrawing a model once released.

The AI Act partially addresses this issue by granting partial exemptions to models whose parameters, architecture, and usage information are published. These exemptions do not apply to models with systemic risk (10²⁵ FLOPS of training). In 2026, the open source ecosystem is dominated by Llama, Mistral, DeepSeek, and Qwen, which collectively cover the majority of enterprise and academic use cases without dependency on a single provider's API.

Specialized and Vertical Models

Beyond general-purpose models, the ecosystem is diversifying into vertical models. In healthcare: H-optimus-0 by Bioptimus for assisted medical diagnosis, Pharia-1-LLM by Aleph Alpha for German, and specific fine-tunes for radiology and oncology. In law: Lefebvre Dalloz-Barreau de Paris assistants, Talan-Mutuelle Générale applications. In coding: Codestral (Mistral), Code Llama (Meta), DeepSeek-Coder. In finance: proprietary models at BNP Paribas, Crédit Agricole, JPMorgan. The efficient small models (SLM, Small Language Models) movement-Phi-3, Mistral Ministral, SmolLM2, Gemma 2 2B-targets embedded deployments (phones, cars, IoT) with acceptable quality at very low inference cost.

The year 2025 saw the emergence of AI agents as a new paradigm for LLM usage. Rather than responding to a single query, the agent chains actions (tool calls, web searches, file writing, code execution) to solve complex tasks. Gemini 2.0 Flash was unveiled in December 2024 as the model paving the way for this new product family. AI Decision Matrix by AI Builders provides a comparative assessment framework for IT leaders facing a proliferation of solutions.

Key Issues for 2026 and Beyond

Several dynamics to watch in the next 18 months:

GPAI compliance under the AI Act, now in force since August 2025;
the economics of inference, now dominant in volume compared to training, driving a shift toward MoE architectures and quantization (BitNet, vLLM, llm-optimizer);
the relationship with copyright law, particularly after the shelving of the Darcos law in France and ongoing case law on Meta-Llama, NYT-OpenAI, Getty-Stable Diffusion;
the race for reasoning models, with OpenAI o3, DeepSeek-R1, Gemini Thinking, and Hunyuan-T1 vying for leadership;
the multimodal convergence, as native models process text, images, video, and audio in a unified space;
the rise of agents and the associated questions of reliability (success rates on long tasks), safety (control over actions taken), and business model;
European sovereignty, embodied by Mistral, OpenEuroLLM, LightOn, Aleph Alpha, and the push to decouple inference compute from US providers through OVHcloud and new NVIDIA Tensor Core GPUs.

The evolution of LLMs is no longer just a parameter race. The winners will combine data quality, reinforcement post-training, controlled inference infrastructure, a licensing strategy aligned with their target market, and regulatory compliance. It is now as much an industrial, geopolitical, and legal issue as a scientific one.

Frequently asked questions

What is an LLM (large language model)?

An LLM is a very large neural network - from several billion to several hundred billion parameters - based on the transformer architecture. It is trained to predict the next unit (token) in a text, using hundreds of billions to several trillion tokens. From this simple task, complex capabilities emerge: dialogue, reasoning, code generation, translation.

What is the difference between an LLM and a foundation model?

A foundation model is a reusable AI model that serves as a base for many specialized applications via fine-tuning, RAG, or prompt engineering. An LLM is a type of foundation model specialized in language. However, the term also extends to multimodal models (image, audio, video) that share the same architectural and economic logic.

Which are the top-performing LLMs in 2026?

On public benchmarks: GPT-4o and o1/o3 (OpenAI), Claude 3.5 Sonnet and Claude 3 Opus (Anthropic), Gemini 2.0 Flash and Gemini Ultra (Google), Llama 3.1 405B (Meta), Mistral Large 2 (Mistral AI), DeepSeek-V3 and DeepSeek-R1 (DeepSeek), Qwen2.5 (Alibaba), Hunyuan-T1 (Tencent). None dominates across all dimensions; the choice depends on the use case (reasoning, latency, cost, languages, multimodality).

How much does it cost to train a state-of-the-art LLM?

For dense models with more than 70 billion parameters, budgets range from $5 million to $100 million depending on size and efficiency. GPT-4 is estimated at ~$100M, Llama 3.1 405B at ~$50M, DeepSeek-V3 at ~$5M (record efficiency). These figures only cover final training; including prior experimentation and post-training, total costs are 3 to 10 times higher.

What is a Mixture of Experts (MoE) model?

It is an architecture where the network is divided into several specialized expert subnetworks, and a router selects a few experts to activate for each token. This enables increasing the total number of parameters without proportionally increasing inference cost. Mixtral 8x7B, DeepSeek-V3, and GPT-4o (presumed) use this architecture.

Why did DeepSeek cause such a shock in January 2025?

DeepSeek-V3 and then DeepSeek-R1 demonstrated that it was possible to reach the level of the best proprietary American models with a training budget about 30 times lower and in open source. This challenged the advantage of massive infrastructures and triggered a temporary stock drop for NVIDIA, illustrating the fragility of the current valuation of the AI ecosystem.

Which are the European LLMs?

Mistral AI (Mistral Large 2, Mixtral, Codestral, Ministral, Pixtral) is the European leader. Aleph Alpha is developing Pharia-1-LLM in German. LightOn offers Paradigm for enterprise. Black Forest Labs publishes FLUX-1 for text-to-image. OpenEuroLLM is a European academic consortium. The project aims to provide a sovereign alternative to American and Chinese models.

Open source or proprietary: which to choose?

It depends on the use case. Proprietary (OpenAI, Anthropic, Gemini) offers the simplicity of a managed API and access to cutting-edge models. Open source (Llama, Mistral, DeepSeek) enables on-premise hosting, data sovereignty, model auditing, and avoiding vendor lock-in - at the cost of infrastructure and internal expertise. For regulated uses (healthcare, finance, defense), hosted open source often becomes the norm.

What is a reasoning model?

A reasoning model explicitly produces a chain-of-thought before answering, which drastically improves its performance on competitive mathematics, logic, and programming. OpenAI o1/o3, DeepSeek-R1, Tencent Hunyuan-T1, and Gemini Thinking are the main representatives. The inference cost increases (higher latency), but so does quality.

What are the main risks associated with LLMs?

Hallucinations (generation of factually false content), prompt injection and jailbreak (bypassing safeguards), bias (reflection of the training corpus), opinion manipulation (EPFL study 2024), energy and water consumption, private data leakage, industrial dependence on model and GPU providers. The AI Act addresses several of these risks for systemic-risk models.

How is an LLM evaluated?

Through public benchmarks (MMLU, GPQA, MATH, HumanEval, SWE-Bench, LiveCodeBench, MT-Bench), through blind human evaluations (Chatbot Arena), and through internal tests adapted to the use case. Open benchmarks quickly saturate: a model exceeding 90% on MMLU is no longer distinguishable from others. Real-task evaluation (writing, code production, long reasoning) remains essential.

What is the next step for LLMs?

AI agents - systems capable of autonomously chaining together complex actions - are the major focus for 2025-2027. Beyond that, the industry is working on long-term reliability (alignment, safety), inference efficiency, native multimodal convergence, continual learning, and infrastructure sovereignty. The question of training data remains fundamental: public web corpora are starting to saturate, paving the way for synthetic data and editorial partnerships.

on the same theme

Articles récents

4 articles liés à ce sujet

Alibaba Unveils Smart Cockpits, AI Glasses, and Strategic Partnerships at WAIC 2025

At the World Artificial Intelligence Conference 2025, Alibaba Cloud unveiled several applications of its AI language models, including a smart cockpit...

AI Market Commercial product

Aug 1, 2025 Read more →

DeepSeek-R1-0528: The Chinese Start-up Continues to Compete with American Giants with an Update to Its Flagship Model

The Chinese start-up DeepSeek has updated its R1 model, improving its performance in reasoning, logic, mathematics, and programming. This update, whic...

Tool for Data Scientists Commercial product

Jun 2, 2025 Read more →

When AI Becomes a Shield: How LLMs Concretely Change Cybersecurity

Large Language Models (LLMs) are gradually becoming prominent across all sectors, including the highly strategic field of cybersecurity. They enable l...

Security

May 15, 2025 Read more →

Tencent Launches Hunyuan-T1 Reasoning Model, Rivaling State-of-the-Art

Just a month after introducing its TurboS reasoning model, Tencent unveils Hunyuan-T1. With large-scale post-training, its reasoning capability is exp...

AI Market

Apr 19, 2025 Read more →

Statistiques

Articles totaux 4

Contenu mis à jour 5 days ago

By category

By sector