Anthropic Unveils Claude 4, Its Agents Optimized for Programming and Complex Tasks

TLDR : Anthropic unveils Claude Opus 4 and Claude Sonnet 4, AI models for coding and complex reasoning. Claude Opus 4, designed for long tasks, shows impressive performance on SWE and Terminal benchmarks, while Claude Sonnet 4 offers quick responses for daily applications.

Anthropic has just unveiled the new generation of its Claude models with the launch of Claude Opus 4 and Claude Sonnet 4. These models explicitly target advanced use cases in coding, complex reasoning, and agent-based automation, with performances that redefine the top tier of current LLMs.

Two Models, Two Uses, One Common Ambition

Claude Opus 4 is presented by Anthropic as the world's best coding model, with remarkable results on the SWE-bench (72.5%) and Terminal-bench (43.2%) benchmarks. Designed for long and complex tasks, it is intended to operate for several hours without performance loss, making it ideal for multi-agent architectures or heavy industrial workflows.

Claude Sonnet 4, a lighter yet robust version, replaces Sonnet 3.7 with a marked improvement on coding tasks (72.7% on SWE-bench). It is designed for everyday applications requiring quick but reliable responses, including for free users.

Benchmarks and Performance: Dominating Real Tasks

Claude 4 surpasses GPT-4 and Gemini 2.5 on real software engineering tasks (SWE-bench Verified).

Claude 4 stands out not only for its reasoning capabilities but also for its ability to stay on track without logical shortcuts. According to Anthropic, both models are 65% less likely to resort to "shortcuts" and infinite loops in critical agentic tasks than their predecessor.

New Technical Features

The Claude 4 models introduce "extended thinking" with integrated tools, allowing the AI to dynamically switch between reasoning and tool usage (such as a web search) during a task.

They can:

Use multiple tools in parallel
Retain information in local files, simulating a working memory
Generate reasoning summaries to improve the readability of long chains of thought

Claude Code: An Autonomous Development Copilot

Already in testing on GitHub, Claude Code is now available in a stable version. This system offers native integrations for VS Code and JetBrains, with code suggestions displayed directly in your files.

An SDK is also available to develop your own agents based on Claude Code, with a key example: a GitHub integration allowing Claude to automatically act on PRs, CI/CD errors, or complex refactorings.

Availability and Pricing

Both models are available on:

Anthropic API
Amazon Bedrock
Google Vertex AI

💰 Pricing:

Claude Opus 4: $15 / $75 per million tokens (input/output)
Claude Sonnet 4: $3 / $15

👉 For more information or to test the models: claude.ai

Between Gemini 2.5 Pro, OpenAI Codex and Claude 4, LLM publishers seem eager to accelerate assistance in programming, a high-value task for LLMs.

Translated from Anthropic dévoile Claude 4, ses agents optimisés pour la programmation et les tâches complexes

To better understand

What is the 'extended thinking' with integrated tools in the Claude 4 models?

'Extended thinking' allows the AI to dynamically switch from reasoning to using external tools like web search, thus enhancing its effectiveness in complex tasks.

How does Anthropic integrate into the tech ecosystem through platforms like Amazon Bedrock and Google Vertex AI?

Anthropic strategically positions itself by integrating into major platforms like Amazon Bedrock and Google Vertex AI, allowing extended access to its models through various cloud solutions, thus facilitating adoption by diverse users and industries.