Just Now, Anthropic Released Sonnet 5, Performance Close to Opus 4.8, but Not Necessarily Cheaper

marsbitPublicado em 2026-07-01Última atualização em 2026-07-01

Resumo

Anthropic has officially released Claude Sonnet 5, describing it as the most "agentic" Sonnet model to date. It can plan, use tools like browsers and terminals, and autonomously perform tasks at a level previously requiring larger, more expensive models. Performance in reasoning, tool use, programming, and knowledge work has significantly improved compared to Sonnet 4.6, now approaching that of Opus 4.8. Evaluation results indicate that Sonnet 5, at medium "effort" levels, offers better cost efficiency than its predecessor. At higher effort levels, its performance in some tasks can match Opus 4.8. In terms of safety, Sonnet 5 shows improved rates of refusing malicious requests and resisting prompt injection attacks compared to Sonnet 4.6, though it has a slightly higher rate of policy-violating behavior than Opus 4.8 and Mythos Preview. Its cybersecurity capabilities remain weaker than those models. Notably, Sonnet 5 uses a new tokenizer. The same text input now results in approximately 1.0 to 1.35 times more tokens, depending on content. To offset this, Anthropic offers a promotional launch price until August 31, 2026, at $2 per million input tokens and $10 per million output tokens. The standard pricing will be $3/$15 per million tokens thereafter. However, some external analysis suggests that due to increased token usage, the actual cost per task for Sonnet 5 may be higher than both Sonnet 4.6 and Opus 4.8.

Just now, Anthropic officially released the new model Claude Sonnet 5, calling it "the most Agentic Sonnet model to date." It can formulate plans, use tools like browsers and terminals, and operate autonomously at a level that, just months ago, required larger and more expensive models.

Sonnet 5 shows significant performance improvements over Sonnet 4.6 in reasoning, tool use, programming, and knowledge work, coming closer to Opus 4.8 while being more affordable.

According to the official announcement, for developers, the AI Agent era truly began with Sonnet-level models: Claude Sonnet 3.5, 3.6, and 3.7 were among the first models to show impressive capabilities in programming and tool use. However, in recent times, the most notable advancements in Agentic capabilities have primarily appeared in Opus-level models.

Claude Sonnet 5 significantly narrows this gap: its performance is now close to Opus 4.8, but at a lower price. Compared to its predecessor Sonnet 4.6, it shows substantial improvements across key dimensions of Agentic performance like reasoning, tool use, programming, and knowledge work. A detailed comparison is shown below:

The chart below compares Sonnet 5, Sonnet 4.6, and Opus 4.8 on the Agentic search benchmark BrowseComp and the computer use benchmark OSWorld-Verified, at different "effort levels":

  • Sonnet 5 (orange line) shows a clear performance improvement over Sonnet 4.6 (gray line) and offers a broader range of cost-performance options than Opus 4.8 (yellow line).
  • At medium effort levels, Sonnet 5 significantly improves cost efficiency; at higher effort levels, its performance can match Opus 4.8 on certain tasks.
  • Between Sonnet 5 and Opus 4.8, users can flexibly adjust the effort level based on specific tasks to find the optimal balance between cost and performance for their needs.

The cost-performance curve at different effort levels is shown above. The previous best Sonnet model (Sonnet 4.6) was far from matching Opus 4.8. Sonnet 5 provides a wider range of cost-performance options than Sonnet 4.6 and can reach the capability level of Opus 4.8 in some scenarios. The Sonnet 5 pricing shown in the chart is input $3 / million tokens, output $15 / million tokens. With the introductory pricing valid until August 31st (input $2 / million tokens, output $10 / million tokens), the actual cost of Sonnet 5 is even lower than shown. Opus 4.8 pricing is input $5 / million tokens, output $25 / million tokens.

Feedback from Anthropic's early access partners has been consistent: Sonnet 5 is more capable as an autonomous agent (more agentic) than its predecessor. Testers describe it as capable of completing complex tasks — tasks where previous Sonnet models would get stuck; it proactively checks its own outputs without explicit prompting; and it accomplishes all this Agentic work at a highly attractive price:

Safety Evaluation

Anthropic's pre-deployment safety evaluations found that Sonnet 5 shows overall improvement compared to Sonnet 4.6. In terms of autonomous agent safety, the model performs better at rejecting malicious requests and resisting hijacking attempts in prompt injection attacks. The model's hallucination rate and sycophancy rate are both lower than Sonnet 4.6. In the automated behavioral audit (testing a wide range of misbehaviors, such as assisting abuse and deception), Sonnet 5 scored lower (i.e., was safer).

However, compared to the more capable Opus 4.8 and Claude Mythos Preview, it did show a slightly higher rate of misbehavior in this audit.

The chart above shows the misbehavior rate in the automated behavioral audit, which tests a large number of undesirable behaviors across various scenarios and contexts (see Section 6.4 of the Sonnet 5 System Card for the complete list and results per behavior). Sonnet 5's overall misbehavior rate is lower than Sonnet 4.6, but higher than Mythos Preview and Opus 4.8.

Anthropic states that they did not specifically train Sonnet 5 for cybersecurity tasks. It can perform some routine, harmless web tasks, but in evaluations of potentially dangerous cyber skills (like developing software exploit code), its performance is significantly weaker than models like Opus 4.8 and Mythos 5.

The chart below shows scores from one such evaluation, testing the model's ability to develop an exploit for a Firefox browser vulnerability. Sonnet 5 consistently failed to develop a fully functional exploit, but its partial success rate was slightly higher than Sonnet 4.6. The latter's improvement likely stems from general intelligence gains, not specific training.

The chart above shows scores for models successfully developing an exploit for a software vulnerability in Firefox 147 (this evaluation was developed in collaboration with Mozilla; all vulnerabilities were patched in Firefox 148). For each model, the left bar indicates how often the model (without safety guardrails) developed a functional exploit, and the right bar indicates the frequency of partial success. Both Sonnet models failed to develop a functional exploit (score 0.0% each); Sonnet 5's partial success rate is slightly higher than Sonnet 4.6. Both Sonnet models' cyber capabilities are significantly weaker than Opus 4.8 and Mythos 5.

As Sonnet 5 is slightly more capable at these tasks than its predecessor, Anthropic has enabled cybersecurity guardrails by default. These guardrails — capable of detecting and blocking dangerous cyber use in real-time — are the same as those in Claude Opus 4.7 and 4.8 (because Anthropic judged Sonnet 5's overall cybersecurity risk to be low, its guardrails are less strict than those enabled for Fable 5 — which block a wider range of cybersecurity tasks).

Anthropic's complete assessment report for Sonnet 5 across multiple safety and capability evaluations can be found in the Claude Sonnet 5 System Card.

Pricing

Starting today, Claude Sonnet 5 is generally available across all channels. To celebrate the launch, Anthropic is offering a limited-time introductory price:

  • From now until August 31, 2026: Input $2 / million tokens, Output $10 / million tokens
  • Standard pricing after that: Input $3 / million tokens, Output $15 / million tokens

Simultaneously, they announced a comprehensive increase in rate limits for Chat, Cowork, Claude Code, and the Claude platform to accommodate the higher token consumption from higher "effort level" modes.

Important Notes

Cybersecurity Verification

Sonnet 5 has been included in Anthropic's "Cybersecurity Verification Program." The program is now available for use on the following platforms:

  • Claude Native Platform
  • Claude Platform on AWS
  • Claude in Microsoft Foundry (hosted on Azure and by Anthropic)

Claude on Google Vertex will also support it soon.

Organizations already enrolled in the program automatically receive equivalent access for Sonnet 5, no re-application needed. If your cybersecurity work requires fewer safety guardrail restrictions, Anthropic recommends using Claude Opus 4.8.

Tokenizer Update & Pricing Explanation

Sonnet 5 is an upgrade to Sonnet 4.6 but uses a new tokenizer to optimize text processing performance (similar to the tokenizer change introduced with Claude Opus 4.7).

The consequence is: The same input text now maps to more tokens, with the increase being approximately 1.0x to 1.35x depending on the content type.

Accordingly, the introductory price has been set so that the overall cost of switching to Sonnet 5 remains roughly the same for users.

Rate Limit Adjustment Explanation

As early as April 26, 2026, Anthropic had already increased rate limits for Sonnet and Haiku models across all usage tiers and simplified the native Claude platform plans into three tiers: Start, Build, Scale.

With this update, Anthropic has further increased rate limits for Chat, Cowork, Claude Code, and the Claude platform to accommodate the higher token consumption from higher "effort level" modes.

You can view your current tier and specific limits in the Claude Console or consult the documentation for more details.

Benchmark Score Correction Explanation (Supplemental)

  • Humanity’s Last Exam: Anthropic updated the scoring model for this benchmark and, based on that, revised Sonnet 4.6's score to 34.6% (without tools) and 46.8% (with tools). Therefore, this score differs from the data reported in the Sonnet 4.6 launch blog. This is noted for clarity.
  • OSWorld‐Verified: Anthropic optimized the execution of this benchmark to better reflect the model's performance in real-world scenarios and revised Sonnet 4.6's score to 78.5%. This is also the reason this score differs from the data in the Sonnet 4.6 launch blog.

Developer Hands-on Feedback

As soon as Claude Sonnet 5 was released, people have already started testing it.

User Nicolas Bustamante said that one thing he likes about Sonnet 5 is that it's fast and optimized for Agents. "My favorite example is browser usage: fast and safe."

According to system card results, the success rate of prompt injection attacks in browser usage scenarios is only 0.93% for Sonnet 5, while it's 31.5% for Opus 4.8 and 50.7% for Sonnet 4.6.

However, some users commented, "Too expensive."

According to an analysis by Artificial Analysis, on the Intelligence Index, the run cost for Claude Sonnet 5 is $2.29 per task, an increase of about 2x compared to Sonnet 4.6, and also about 15% higher than Claude Opus 4.8. This cost increase is entirely driven by higher token usage, making Claude Sonnet 5 one of the most expensive models to run, second only to Claude Fable 5.

What about you? How do you feel about the new model? Feel free to leave comments and discuss below!

Reference Links:

https://x.com/claudeai/status/2072017450611142835

https://www.anthropic.com/news/claude-sonnet-5

https://x.com/ArtificialAnlys/status/2072062595482456431

This article is from the WeChat public account "Almost Human" (ID: almosthuman2014), author: following AI

Perguntas relacionadas

QWhat is Anthropic's newly released model and what are its key capabilities?

AAnthropic has released Claude Sonnet 5, described as the 'most agentic Sonnet model to date.' It is capable of making plans, using tools like browsers and terminals, and autonomously operating at a level previously requiring larger, more expensive models.

QHow does the performance and pricing of Claude Sonnet 5 compare to Claude Opus 4.8 and Sonnet 4.6?

ASonnet 5 offers significant performance improvements over Sonnet 4.6 in reasoning, tool use, coding, and knowledge work, bringing its performance close to Opus 4.8. While its standard pricing ($3 input, $15 output per million tokens) is lower than Opus 4.8 ($5 input, $25 output), some analysis indicates its cost per task can be higher due to increased token usage.

QWhat safety improvements were noted for Sonnet 5 compared to its predecessor?

AAnthropic's safety assessment found Sonnet 5 improves upon Sonnet 4.6. It is better at refusing malicious requests, resisting prompt injection hijacks, and has lower hallucination and sycophancy rates. Its score on an automated behavior audit for misuse was also lower (safer), though slightly higher than Opus 4.8 and Mythos Preview.

QWhat is the key change in Sonnet 5's tokenizer and how does it affect users?

ASonnet 5 uses a new tokenizer that increases the number of tokens for the same input by a factor of approximately 1.0 to 1.35, depending on content type. To offset this and keep transition costs similar, Anthropic introduced a promotional launch price ($2 input, $10 output per million tokens until August 31, 2026).

QWhy does the article's title say Sonnet 5 is 'not necessarily cheaper' despite its lower token price?

AThe title highlights that although the per-token price for Sonnet 5 is lower than Opus 4.8, the new tokenizer results in more tokens being used for the same tasks. Independent analysis (e.g., from Artificial Analysis) suggests the total cost per task for Sonnet 5 is about 15% higher than Opus 4.8 and roughly double that of Sonnet 4.6, making it one of the more expensive models to run in practice.

Leituras Relacionadas

Web3 Bear Market Survival Guide: Ten Great Books to Help You Navigate the Cycles

"Web3 Bear Market Survival Guide: Ten Books to Help You Navigate the Cycle" This article presents a curated book list aimed at helping Web3 enthusiasts and professionals endure and grow during crypto market downturns. It argues that bear markets are not just periods of waiting but crucial times for deepening one's foundational understanding beyond technical whitepapers and price charts. The ten recommended books offer perspectives on technology, economics, philosophy, and strategy to build resilience and long-term vision. The list includes: 1. **"The Inevitable" by Kevin Kelly:** For using a long-term technological lens to combat uncertainty about the future, including the role of crypto and AI. 2. **"Human Action" by Ludwig von Mises:** To upgrade one's economic and philosophical framework, understanding action, speculation, and calculation in a bear market context. 3. **"The Nature of Technology" by W. Brian Arthur:** For viewing blockchain and crypto as combinatorial evolutions of existing technologies, understanding their modular and economic development. 4. **"The Distant Savior" (Chinese novel):** Explores the cultural attributes of self-reliance ("strong culture") versus dependency ("weak culture"), crucial for surviving industry cycles. 5. **"The Sovereign Individual" by James Dale Davidson & Lord William Rees-Mogg:** A prophetic 1997 work on how technology empowers individuals and challenges nation-states, foreshadowing Bitcoin's emergence. 6. **"Japanization: What the World Can Learn from Japan's Lost Decades" (Adapted title):** Uses Japan's economic history as a case study to identify structural opportunities that persist even during broader recessions. 7. **"Denationalisation of Money" by F.A. Hayek:** The ideological blueprint for Bitcoin, arguing for competitive currency issuance beyond state monopoly. 8. **"Duan Yongping Investment Q&A" (Chinese compilation):** Emphasizes the simple discipline of "doing the right things and doing things right," focusing on fundamentals and maintaining a "stop doing list." 9. **"The Network State: How To Start a New Country" by Balaji Srinivasan:** A visionary text from a crypto insider outlining bold predictions and concrete ideas for a blockchain-based future across media, governance, and identity. 10. **"Selected Works of Mao Zedong" (Vol. 1):** Analyzed as a strategic playbook for a weak force challenging a powerful establishment, offering lessons on strategy, alliance-building, and perseverance for the crypto movement. The conclusion states that bear markets filter out those with weak conviction, not weak skills. Survival depends on cognitive depth and mental fortitude, which these books aim to provide.

Foresight NewsHá 18m

Web3 Bear Market Survival Guide: Ten Great Books to Help You Navigate the Cycles

Foresight NewsHá 18m

Trump's 25-Year Financial Report: Family Earns Over $1 Billion Annually from Crypto, While Retail Investors Lose Money on $TRUMP

Former President Donald Trump's family earned approximately $1.2 billion from cryptocurrency ventures in 2025, according to a financial disclosure report. This revenue stream, outlined in a 927-page filing, now surpasses income from most of his long-established real estate holdings. The crypto earnings originated from two main sources: over $500 million from the sale of products like "governance tokens" by World Liberty Financial, a DeFi project co-owned by the Trump family, and roughly $635 million in royalties from the Trump-themed meme coin $TRUMP, issued by CIC Digital LLC. While Trump's entities profited, retail investors faced significant losses. The $TRUMP token, which peaked above $74 shortly after its January 2025 launch, has plummeted to around $1.68. World Liberty Financial's token has also fallen roughly 80% since its debut. Reports indicate that the majority of meme coin buyers have lost money, with Trump-linked entities still holding about 80% of $TRUMP's supply under vesting plans. The disclosure highlights a stark contrast: Trump's crypto and real estate businesses flourished—with new international property deals bringing in tens of millions—even as his administration shifted to crypto-friendly policies, relaxing the stringent regulatory stance of the previous Biden administration. The White House maintains that Trump acts only in the public interest, with his businesses placed in a trust managed by his sons, denying any conflict of interest. However, the report notes the difficulty of assessing such conflicts, particularly regarding foreign business dealings with countries that later received favorable U.S. policy decisions.

marsbitHá 38m

Trump's 25-Year Financial Report: Family Earns Over $1 Billion Annually from Crypto, While Retail Investors Lose Money on $TRUMP

marsbitHá 38m

From 'Address Clustering' to 'Evidence Standards': Why is Chainalysis Redefining Blockchain Tracing?

**Summary:** In June 2026, Chainalysis introduced the **Blockchain Tracing Ontology (BTO)**, a proposed data framework aiming to establish standardized, transparent, and verifiable models for blockchain analysis. This initiative addresses a core industry issue: despite public blockchain data, different firms often produce inconsistent results (e.g., differing entity labels for the same address) due to non-uniform methodologies, particularly in **address clustering**. This lack of standardization poses challenges for judicial investigations, AML, and enforcement. The BTO is not a new clustering algorithm but a **common "language" or conceptual framework**. It moves beyond the simplistic "cluster" model by introducing a hierarchical structure: **Entity → Wallet → Wallet Segment → Address**, which better reflects complex organizational wallet management. A key shift is from presenting mere results to ensuring **process trust and explainability**. The framework emphasizes documenting the **Evidence** and **Confidence** behind each analytical claim—specifying the on-chain/off-chain data, rules applied, and certainty levels—enabling third-party verification. This focus is partly informed by legal precedents like the **Bitcoin Fog** case, where Chainalysis's methods underwent rigorous judicial scrutiny (Daubert hearing), highlighting the need for reproducible, scientifically sound analysis. The proposal clarifies that on-chain analysis identifies address relationships and flow patterns, not real-world identities, which still require off-chain evidence (e.g., KYC data). Ultimately, Chainalysis envisions steering the industry from an "experience-driven" to a **"standards-driven"** future, where competition centers on **data quality, analytical transparency, and judicial admissibility** rather than just label coverage. Widespread adoption could facilitate cross-agency collaboration, reduce disputes, and provide a more reliable foundation for global compliance and enforcement.

marsbitHá 1h

From 'Address Clustering' to 'Evidence Standards': Why is Chainalysis Redefining Blockchain Tracing?

marsbitHá 1h

The 'Conference Circuit' for the Second Half of the Year Begins! A Complete Overview of the 2026 Web3 Global Summit Schedule

"Web3 Global Summit Calendar for the Second Half of 2026" provides a comprehensive list of major Web3 and blockchain conferences worldwide, focusing on events from July to December 2026. The schedule starts in July with IVS in Kyoto, WebX in Tokyo, Canada Crypto Week in Toronto, and Malaysia Blockchain Week in Kuala Lumpur. August features Conviction in Ho Chi Minh City, Coinfest Asia in Bali, and Bitcoin Hong Kong. September is the most intense month, with notable events like NFT NYC in New York, ETHRome in Rome, Money20/20 in Saudi Arabia, European Blockchain Convention in Barcelona, and Korea Blockchain Week in Seoul. The fourth quarter begins with the significant TOKEN2049 Singapore in October, which will be the sole TOKEN2049 event of the year following the cancellation of the Dubai edition. November includes Devcon 8 and Bitcoin Amsterdam in Amsterdam, Digital Asset Summit and Solana Breakpoint in London. The year concludes in December with Blockchain Life in Dubai and Bitcoin MENA in Abu Dhabi. The article also lists key events from the first half of the year (January to June, marked as concluded) for reference, including Consensus Hong Kong, ETHDenver, and Paris Blockchain Week. The guide serves as a resource for planning attendance at these industry gatherings across Asia, Europe, North America, and the Middle East.

Foresight NewsHá 1h

The 'Conference Circuit' for the Second Half of the Year Begins! A Complete Overview of the 2026 Web3 Global Summit Schedule

Foresight NewsHá 1h

Trading

Spot
活动图片