Anthropic Lanza Claude Sonnet 4.5, new reference in coding and agents AI


By Canuto

Anthropic presents Claude Sonnet 4.5, a model designed for advanced coding, execution of agents and extended use of computers; It arrives with new tools for developers, security improvements and the promise to maintain focus on complex tasks for more than 30 hours.
***

  • Claude Sonnet 4.5 leads in computer use tests (Osworld 61.4%) and improves coding yield compared to Sonnet 4.
  • Includes updates in Claude Code, the extension for Chrome, an SDK for agents and new memory tools in the API.
  • It is launched under ASL-3 protections; CBRN classifiers have reduced false signals by 10x since its initial version.

Anthropic announced on September 29, 2025 the launch of Claude Sonnet 4.5, which defines as its “most powerful coding model in the world” and the most capable model to build complex agents and use computers in real tasks. According to the company, Sonnet 4.5 incorporates substantial improvements in reasoning and mathematics, and arrives accompanied by updates in products and tools for developers and users.

The company published technical details and evaluation results that reflect advances in Real Benchmarks. Anthropic indicates that Sonnet 4.5 keeps the focus for more than 30 hours in complex and multiple steps tasks, and reports notable profits in tests such as Osworld and Swe-Bench Verified.

The arrival of the new model includes concrete improvements in its applications and on the developer platform. Among them: control points in Claude Code, a renewed interface of the terminal, a native extension for vs Code, a context editing function and a memory tool in the Claude API designed for longer duration and complexity agents.

Anthropic also states that the execution of code and the creation of files – calculation supplies, slides and documents – can now be carried out directly within the conversation in the Claude Apps. The Claude extension for Chrome is now available for Max users who were part of the waiting list.

Performance results and benchmarks

In public evaluations and developed by Anthropic herself, Claude Sonnet 4.5 reached 61.4% in Osworld, a test that measures real -world computer tasks. To compare, four months ago Sonnet 4 led with 42.2%. Anthropic highlights this advance as a significant jump in the use of the computer.

In Swe-Bench Verified, aimed at coding skills in real scenarios, Sonnet 4.5 reached 77.2% in a configuration reported with a 200k reasoning budget in a set of 500 problems. The company adds that a 1M Tokens context configuration reaches 78.2%, although the main result was 200k for methodological reasons mentioned in its notes.

Anthropic also details additional metrics and frames: Terminal-Bench using Terminus 2, τ2-Bench with extended reasoning and Prompt adjustments for certain failure modes, Aime with sampling and 64K Reasoning Tokens in Python, and Mmmlu averaged in 14 non-English languages ​​with extended reasoning. The company also cites public data of comparative with OpenAI and Gemini results, using the sources cited in their foot notes.

The company affirmed practical observations: Sonnet 4.5 can execute parallel actions effectively, for example by launching multiple BASH commands simultaneously, and maintains coherence in large code bases for prolonged effort.

Client testimonies and use cases

Anthropic included multiple testimonies of customers and partners who describe impacts in development, security, design and finance. A partner said that “we see an avant -garde coding yield … with significant improvements in wider horizon tasks” and that this improvement confirms why many developers choose Claude for complex problems.

Github Copilot reported initial evaluations with “significant improvements in reasoning of multiple steps and code understanding”, which helps Copilot to handle more complex agentic tasks. Another client said that Sonnet 4.5 reinvented development speed by understanding patterns of his code to deliver precise implementations.

In security, Anthropic cited a team that reduced the average vulnerabilities entry time by 44% and improved 25% accuracy with their security agents HAI. In litigation, a testimony indicated that Sonnet 4.5 can analyze complete information cycles and synthesize drafts of judicial opinion with high quality.

Design companies such as Canva and Figma reported benefits in long -context tasks and Iterative generation of PROMPTS. Anthropic also mentioned emerging use in red penetration tests, where Sonnet 4.5 generates creative attack scenarios that accelerate the understanding of offensive tactics and help reinforce defenses.

ALIGNMENT, SAFETY AND PROTECTIONS

Anthropic presents Sonnet 4.5 as his “most aligned border model” to date. The company indicates that, thanks to improved capabilities and security training, it has reduced problematic behaviors such as adulation, deception and search for power, as well as the tendency to encourage illusory thought.

The launch is carried out under the security level of the 3 (ASL-3) of the Anthropic framework itself. This level includes classifiers designed to detect potentially dangerous entries and outputs, with emphasis on risks related to chemical, biological, radiological and nuclear weapons, known by its CBRN acronym.

Anthropic acknowledges that these classifiers can generate false signals and explain that it has reduced these signals by a ten factor from the original description of the classifiers and by a two -factor from the launch of Claude Opus 4 on May. The company has also enabled the possibility of continuing interrupted conversations with Sonnet 4, a model with lower CBRN risk, while they continue to refine the selectivity of the filters.

The technical documentation and a system card accompany the model, including evaluations that for the first time use mechanical interpretability techniques for safety and alignment tests.

Developer tools: Claude Agent SDK and updates

Anthropic launched the Claude Agent SDK, described as the same infrastructure that Claude Code feeds. According to the company, this SDK incorporates memory solutions in long tasks, permissions systems that balance autonomy and user control, and subacids coordination with shared objectives.

The firm explains that Claude Code spent more than six months and now offers that platform to developers to build agents capable for a variety of tasks, not just coding. Claude Developer Platform’s updates, including Agent SDK, are available for all developers.

In addition, Anthropic recommends updating Claude Sonnet 4.5 in all uses: APPS, API and Claude Code. Code execution and file creation are available in all paid plans in Claude Apps. For developers, Sonnet 4.5 is available via Claude API with the Claude-SONNET-4-5 identifier.

The reported price remains the same as Sonnet 4: USD $ 3 / USD $ 15 per million tokens, according to the company.

Preview research and availability

Anthropic launches a temporary previous view called “Imagine with Claude” that generates real -time software without pre -written code; The demonstration responds and adapts to user requests. The company offers that experiment for Max subscribers for five days at Claude.Ai/imagine.

Complete technical notes, the system card and evaluation documentation are available on the model page and in Anthropic engineering publications. There are detailed evaluation frameworks such as Terminal-Bench, τ2-Bench, Aime and MMMLU, as well as public references used for comparisons with other models.

In summary, Anthropic positions Claude Sonnet 4.5 as an integral advance in coding and agentic capabilities, with performance improvements, new tools for reinforced security developers and safeguards. The model is globally available today and the company invites developers and users to test the novelties.


Original image of Diariobitcoin, created with artificial intelligence, for free use, licensed under public domain.

This article was written by an AI content editor and reviewed by a human editor to guarantee quality and precision.

WARNING: Diariobitcoin offers informative and educational content on various topics, including cryptocurrencies, AI, technology and regulations. We do not provide financial advice. Cryptactive investments are high risk and may not be adequate for all. Investigate, consult an expert and verify the applicable legislation before investing. I could lose all its capital.

Subscribe to our newsletter

Similar Posts