Anthropic Claude Sonnet 4.5 released, focuses on coding and agent capabilities

Anthropic's Claude Sonnet 4.5 excels in coding and computer use, leading benchmarks. It introduces the Agent SDK, coding checkpoints, and safety upgrades. Pricing remains at USD 3/ USD 15 per million tokens.

Punam Singh

30 Sep 2025 11:07 IST

New Update

Listen to this article

0.75x1x1.5x

00:00/ 00:00

Anthropic introduced Claude Sonnet 4.5 on 29 September 2025, positioning the new model as a significant step forward, particularly in coding, computer use, and complex agent building. The company concurrently announced a suite of product updates for its developer platform and consumer applications.

Advertisment

Improved Coding and Computer Use

The new model shows significant gains in technical benchmarks. On the SWE-bench Verified evaluation, which tests real-world software coding skills, Claude Sonnet 4.5 achieves state-of-the-art results. The company notes observations of the model maintaining focus on demanding, multi-step tasks for over 30 hours.

In terms of utilising computers, Sonnet 4.5 represents a substantial jump. The model recorded a 61.4% score on the OSWorld benchmark, which assesses an AI's ability to complete real-world computer tasks. This score surpasses the previous model, Sonnet 4, which held the lead four months prior with 42.2%. The upgraded computer uses capabilities that power features like the Claude for Chrome extension, enabling the model to navigate websites, fill spreadsheets, and complete tasks directly in a browser.

Beyond coding and computer interaction, the model displays improved performance in broader academic areas, including reasoning and math. Experts across finance, law, medicine, and STEM fields reportedly found Sonnet 4.5 offered much better domain-specific knowledge and reasoning compared to older models, including Opus 4.1.

Advertisment

Product Upgrades for Developers and Users

Anthropic released several new features alongside Sonnet 4.5:

Claude Code: This environment gains checkpoints, a feature allowing users to save progress and roll back to a prior state. The terminal interface received a refresh, and Anthropic shipped a native VS Code extension.
Claude API: New tools include a context editing feature and a memory tool, designed to help agents run longer and manage increased complexity.
Claude Apps: Consumer applications now feature code execution and file creation (for spreadsheets, slides, and documents) directly within the conversation interface. The Claude for Chrome extension is now available to Max subscribers who joined the waitlist.

Developers also received access to the underlying tools Anthropic uses for building Claude Code. The Claude Agent SDK provides the core infrastructure that powers the company's frontier products, making it available for developers to build their own agents.

Focus on Safety and Alignment

Anthropic describes Sonnet 4.5 as its "most aligned frontier model yet." The company reports large improvements across several areas of alignment compared to prior Claude models. This includes a reduction in concerning behaviours such as sycophancy, deception, and power-seeking.

For the model's new computer use and agentic functions, the company also made progress in defending against prompt injection attacks, a serious security risk for these capabilities.

Anthropic is releasing Sonnet 4.5 under AI Safety Level 3 (ASL-3) protections. This framework involves safeguards, including classifiers that aim to detect potentially dangerous inputs and outputs, specifically those related to chemical, biological, radiological, and nuclear (CBRN) weapons. The company has worked to reduce false positives from these classifiers, noting a tenfold reduction since their original description and a factor of two since the release of Claude Opus 4 in May.

The model is a drop-in replacement for prior Sonnet versions. Developers can access it using claude-sonnet-4-5 via the Claude API. Pricing remains consistent with Claude Sonnet 4, at $3/$15 per million tokens.

A temporary research preview called "Imagine with Claude" allows Max subscribers to experiment with the model-generating software on the fly for five days.