Breaking News! Anthropic Calls for a Universal Pause in AI Research

marsbitPublished on 2026-06-05Last updated on 2026-06-05

Abstract

Anthropic warns of AI self-evolution, reporting that over 80% of its internal code is now written by its AI, Claude. Productivity has surged, with engineers merging 8x more code than in 2024. Claude's performance on complex, open-ended tasks jumped from 26% to 76% success in six months, nearing human parity. The company introduces a new metric: "AI task duration." In 2024, AI handled 4-minute tasks; by 2026, it manages 16-hour tasks, with capability doubling every 4 months. Claude also reviews code, catching bugs that previously caused outages, and significantly outperforms humans in research tasks like optimizing code (52x speedup) and conducting AI safety experiments. Anthropic outlines three potential futures: 1) Progress plateaus, 2) AI accelerates but humans remain in control, or 3) AI achieves full recursive self-improvement (RSI), designing its own successors. This final path could revolutionize fields like medicine but also risks catastrophic alignment failure if control is lost. The call echoes similar concerns from OpenAI. Anthropic proposes a coordinated pause on AI development—if a verifiable mechanism to ensure all labs comply can be established.

By Jay, published by QbitAI

Major Discovery: The Self-Evolution of AI Has Begun.

This is the provocative thesis Anthropic just laid out in a lengthy blog post.

Our internal data suggests Claude is accelerating AI development, potentially on a path of Recursive Self-Improvement (RSI).

This is not mere “scaremongering.” A look at the article shows Anthropic is speaking with hard data—

As of May this year, over 80% of Anthropic's code was written by Claude.

Before Claude Code was released, that figure was only in the single digits.

Simultaneously, the average amount of code delivered by Anthropic engineers per quarter is now 8 times that of the 2021-2025 period.

Even more important is quality—

On the most open-ended, ambiguous programming tasks where even the form of the answer is uncertain, Claude's success rate is now 76%, up from just 26% six months ago.

A 50-percentage-point leap. In half a year.

Many engineers within Anthropic already feel the quality of Claude's code is on par with humans.

It is expected to surpass humans within the year.

Anthropic also emphasizes that if this trend continues, it is entirely possible for AI to design and build the next generation of AI.

This could utterly transform society, bringing immense benefits in healthcare, technology, and the economy. But it could also compound alignment issues, ultimately leading to a loss of control.

Therefore, Anthropic is leading the call:

If there exists a verifiable mechanism that ensures AI labs are indeed not covertly racing ahead, we are willing to slow down, even pause.

Beyond this, Anthropic's blog post contains many other interesting perspectives and facts.

Below is a version organized for easier reading.

Enjoy.

Anthropic's Long-Form Thesis

AI's Moore's Law Has Arrived

Anthropic created a new metric called “Duration of tasks an AI can complete autonomously.”

In March 2024, Claude Opus 3 could handle software tasks that would take a human roughly 4 minutes.

One year later, Claude Sonnet 3.7: 1.5 hours.

Another year, Claude Opus 4.6: 12 hours.

And the latest, Mythos, in internal testing shows:

It can work continuously for “at least” 16 hours, already hitting the upper limit measurable by the METR testing framework.

This doubling speed has accelerated from once every 7 months to once every 4 months.

If the trend holds, by 2027, it could be several weeks.

Claude Writes Most of Anthropic's Code

As of May 2026, over 80% of the code in my Anthropic codebase is written by Claude.

Before Claude Code's release, this number had consistently been in the single digits.

This shift is also reflected in engineers' workflows.

In Anthropic's first four years, the lines of code merged per engineer per day remained largely constant.

In 2025, when Claude began writing its own code, the merge count suddenly skyrocketed.

Now, in Q2 2026, engineers are merging 8 times more code per day than in 2024.

But with more code, is the quality diluted?

Anthropic says that over the past year, engineers have needed to correct Claude less and less.

This is evident in benchmarks, as shown in the chart below.

Across all difficulty levels of tasks, Claude's success rate has been soaring without exception.

So, Anthropic now uses Claude to review code.

Yes, all changes submitted to the codebase first go through an automated Claude review, checking for bugs, security vulnerabilities, and other defects.

Their retrospective analysis found that if this automated review had been in place for every past change, about one-third of the bugs that caused incidents on claude.ai would have been caught before deployment.

Remember, the engineers writing that code are among the world's top experts in building AI systems.

Claude is catching their mistakes.

Creativity Amplifier

Next is Claude's involvement at the research level.

Anthropic has a routine: each time a new model is released, they give Claude a piece of code for training a small AI model and ask it to optimize the runtime speed to the maximum while ensuring correctness.

In May 2025, Claude Opus 4 delivered: a 3x speedup.

In April 2026, Claude Mythos Preview achieved 52x.

For reference, a skilled human researcher would need 4 to 8 hours to barely reach 4x.

In less than a year, Claude surpassed humans.

In April 2026, Anthropic gave Claude an AI safety research question, essentially “Can a weak model reliably supervise a strong model?”, and let Claude propose hypotheses, run experiments...

First, the human performance: two human researchers spent about a week narrowing the gap by 23%.

Claude, after about 800 hours and roughly $18,000 worth of compute—

Narrowed it by 97%.

Where Do We Go From Here?

By now, the conclusion is clear.

The human role in the AI development pipeline is narrowing at every stage.

Coding: Claude does it. Code review: Claude does it. Experiment execution: Claude is an order of magnitude faster than humans. Experiment design: Claude is starting to do it on its own...

The last comparative advantage humans have now is research taste and judgment.

But how long can this advantage hold?

Anthropic says in the blog they are unsure.

One possibility is that “research taste,” like other things AI couldn't do before, starts as impossible, then suddenly becomes possible.

Just as understanding humor, demonstrating theory of mind, and solving linguistic puzzles all followed similar curves.

Another possibility is that even if Claude never truly learns research taste, the current acceleration trend means each human researcher can now orchestrate several times more work simultaneously.

You don't need AI to think completely for you; it just needs to handle all the “execution” work, leaving you to make the 5% of directional choices.

Three Possible Futures for RSI

At the end of the blog, Anthropic outlines three possible evolutionary directions for this “self-evolution” trend.

1. Plateau.

Those exponential curves are actually S-curves.

Perhaps research judgment is something that simply cannot be solved by scaling and requires a completely new architectural breakthrough.

Or, the bottleneck lies in energy, chips, the physical supply chain of compute.

Even if AI capabilities plateau at today's level, it will still bring significant changes to the world.

The recent Project Glasswing saw Mythos Preview discover over ten thousand high and critical severity software vulnerabilities in its first few weeks, spanning the world's most critical systems.

2. AI continues to accelerate, but humans keep their hands on the wheel.

Organizational efficiency will improve exponentially, with 100-person companies doing the work of 10,000 or even 100,000.

Anthropic believes we are most likely heading into this scenario.

But they also observed an interesting phenomenon: the embodiment of Amdahl's Law within organizations—

Claude writes code much faster, making code review the new bottleneck. New ideas, tools, and experiments explode far beyond the organization's capacity to absorb them.

Bottlenecks don't disappear; they just shift to the next stage.

3. AI achieves full recursive self-improvement, beginning to build the next generation of itself.

In this scenario, the speed of AI development depends entirely on compute. Humans retreat to supervisory, verification, and auditing roles.

If this happens, this capability will likely transfer to other scientific fields—medicine, materials, energy—all taking off.

Of course, another future is alignment failure.

In this case, misalignment could accumulate step by step during AI's self-iteration, ultimately leading to—complete loss of control.

One More Thing

The above covers the most critical points of Anthropic's thesis on self-evolution.

Honestly, at first, I didn't take it too seriously. After all, Anthropic is about to IPO. Isn't this a classic “Anthropic-style” PR move?

You know what? This time, it might genuinely be different.

Because just a few days ago, OpenAI published a similar blog post:

We too see early signs of self-evolution in today's systems: AI development itself is being accelerated by AI. We expect this to intensify competitive pressures among developers and nations, and create governance challenges existing institutions cannot handle. With the emergence of RSI, society needs ways to shape AI's developmental trajectory to ensure it serves human interests.

The singularity seems to be arriving faster than anyone anticipated.

Blog: https://www.anthropic.com/institute/recursive-self-improvement

References:[1]https://x.com/kimmonismus/status/2062517474277675102[2]https://x.com/anthropicai/status/2062568873321513443

This article is from the WeChat public account “QbitAI”, author: Focus on Frontier Technology

Trending Cryptos

CitreaCTR

wrapped stUSDTWSTUSDT

Velodrome FinanceVELODROME

The Verdict in Choi Tae-won's Divorce Case: Revealing the Inheritance Undercurrent Behind SK Hynix's Trillion-Won Empire

SK Group Chairman Chey Tae-won's high-profile divorce case, involving a record 1.38 trillion won settlement, has drawn attention to the succession plans for Korea's second-largest conglomerate, especially its crown jewel, SK hynix. Unlike traditional chaebol scripts centered on the eldest son, Chey's three children from his marriage to former President Roh Tae-woo's daughter, Roh Soh-yeong, are carving distinct, non-traditional paths. Eldest daughter Chey Yun-jung (b. 1989) is seen as the most evident successor. With a scientific and consulting background, she holds executive roles at SK bioscience and SK Inc.'s growth support department, focusing on future strategy and biopharma. Her marriage is to an AI infrastructure entrepreneur, not a traditional business alliance. Second daughter Chey Min-jung (b. 1991) took a unique route, voluntarily serving as a South Korean naval officer, including an anti-piracy deployment. She later worked on policy and strategy for SK hynix in Washington D.C. before co-founding an AI-driven healthcare startup. She married a former U.S. Marine Corps officer, connecting her to U.S. defense and policy circles—networks crucial for a global semiconductor giant. The only son, Chey In-geun (b. 1995), who studied physics like his father, worked briefly at SK E&S before joining McKinsey. Despite fitting the traditional "heir" profile as the eldest son, he remains silent and holds no public position or shares in SK, suggesting the old succession playbook is obsolete. As SK hynix's valuation soars, becoming a geopolitical asset in the AI era, the heirs' legitimacy is no longer automatic. They must prove themselves in fields like AI biotech, global policy, and strategic consulting. Their marriages also reflect new elite networks in tech and defense, not old political alliances. Their inheritance is the complex challenge of navigating a globalized, tech-driven world, not just a corporate throne.

marsbit2 days ago 09:06

The Verdict in Choi Tae-won's Divorce Case: Revealing the Inheritance Undercurrent Behind SK Hynix's Trillion-Won Empire

marsbit2 days ago 09:06

Banks oppose stablecoin yield deal – Can CLARITY Act find 60 votes?

The Bank Policy Institute (BPI) has opposed the latest draft of the CLARITY Act, criticizing its provisions on stablecoin yield and illicit finance. The banking industry sought a total ban on stablecoin yield, but the bill's compromise only prohibits passive yield on idle balances. This opposition has influenced lawmakers, reducing tentative Republican Senate support to potentially 49 votes. With the 60-vote threshold needed, securing sufficient Democratic support appears difficult as some pro-crypto Democrats also oppose the bill due to ethics and illicit finance concerns. Senate Majority Leader John Thune expressed doubt the bill can pass before the August recess. Market odds for the bill's passage in 2026 have fallen, leaving its future uncertain.

ambcrypto2 days ago 09:02

Banks oppose stablecoin yield deal – Can CLARITY Act find 60 votes?

ambcrypto2 days ago 09:02

2 Months, Valuation Soars from $8.8B to $68B! The Largest AI Model Hub OpenRouter May Be Acquired

Stripe is reportedly in talks to acquire AI model marketplace OpenRouter for a price nearing $10 billion, a dramatic increase from its $1.3 billion valuation just two months prior. The deal, which could be announced within a month, would see the payment giant absorb a key "router" or aggregation layer in the AI infrastructure stack. OpenRouter provides developers with a single API to access over 400 large language models (LLMs), automatically routing queries to the most suitable model based on cost, capability, and speed. This allows AI applications to optimize expenses while maintaining user experience. Founded in 2023 by ex-OpenSea co-founder Alex Atallah and Louis Vichy, OpenRouter has grown rapidly, reaching $50 million in annualized revenue by April and serving over one million developers. For Stripe, the acquisition of OpenRouter follows its late-2025 purchase of usage-based billing platform Metronome. The combined strategy aims to create an integrated suite for the AI economy: OpenRouter would handle model selection and routing, Metronome would manage granular usage-based billing, and Stripe's core platform would process payments. This positions Stripe to control a critical part of the AI application value chain, influencing which models get used while simplifying cost management for enterprise customers.

链捕手2 days ago 08:59

2 Months, Valuation Soars from $8.8B to $68B! The Largest AI Model Hub OpenRouter May Be Acquired

链捕手2 days ago 08:59

From OpenSea to OpenRouter: Is Alex Atallah Repeating His 'Exit at the Peak' Playbook?

From OpenSea to OpenRouter: Is Alex Atallah Repeating His "Exit at the Peak" Playbook? According to the Wall Street Journal, payments giant Stripe is in talks to acquire the AI model aggregation platform OpenRouter in a potential deal valuing the company near $100 billion. This would mark founder Alex Atallah's second creation of a company reaching a $100 billion valuation, following his co-founding of NFT marketplace OpenSea. OpenRouter, founded just over three years ago, has grown rapidly by acting as a unified gateway for developers to access over 400 AI models. It currently has about 10 million users and processes over 200 trillion tokens monthly. While the platform's annualized revenue is around $50 million, its valuation has skyrocketed from $1.3 billion in March 2026. The potential acquisition by Stripe, a company OpenRouter's founder once likened it to, represents a major expansion into AI infrastructure for the payments leader. This move echoes Atallah's previous timing with OpenSea, where he departed before the NFT market's significant downturn. For OpenRouter, selling now may be strategic. Despite its scale, its business model—charging a 5-5.5% fee on AI inference calls—faces pressure from competition, open-source models, and potential price wars among model providers, limiting its profitability narrative for an IPO. A key asset for potential acquirers like Stripe is OpenRouter's vast repository of real-world AI usage data, which offers unique insights into model performance and developer preferences that are difficult to replicate. Whether this potential deal signifies a new valuation benchmark for AI infrastructure or another market peak signal remains to be seen.

链捕手2 days ago 08:42

Pons V2 brings RWA trading pairs as Robinhood Chain broadens its ambitions

Pons, a key launchpad on Robinhood Chain, has launched its V2 upgrade. The update aims to boost liquidity, remove trading restrictions for most users via an ETH-denominated bonding curve, and introduces support for custom tokenized real-world asset (RWA) trading pairs. This aligns with Robinhood Chain's broader RWA focus. The upgrade also allows creators to collect fees in ETH by default. The network itself is growing rapidly, surpassing $300 million in Total Value Locked. Its cumulative DEX volume has exceeded $9 billion, with about 80% coming from speculative memecoin trading. However, data shows 63% of traders are at a loss, with profits concentrated in a small number of wallets. The introduction of RWAs could help drive more organic adoption for the chain, which is positioning itself as a major player for speculative trading, challenging networks like Base and Solana.

ambcrypto2 days ago 08:26

Trading

Spot

Hot Articles

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

Talus is a decentralized AI Agent framework built on the Sui, designed to solve the structural problems of current AI systems: centralization, opacity, and a lack of native economic identity.

43.3k Total ViewsPublished 2026.03.18Updated 2026.03.18

The Cornerstone of the Autonomous AI Economy: How Talus is Reshaping On-Chain Intelligent Agents

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

By 2026, the integration of artificial intelligence and cryptocurrency has advanced from proof-of-concept to a new stage of "system-level integration".

2.9k Total ViewsPublished 2026.03.26Updated 2026.03.26

In-depth Analysis of AI and Crypto: The Era of Symbiosis between Algorithms and Ledgers

U.S. Equity TradFi Assets: Traditional Finance as a Steady Anchor Amid the AI IPO Boom

In 2026, the U.S. IPO market has regained momentum.

36.6k Total ViewsPublished 2026.07.08Updated 2026.07.08

U.S. Equity TradFi Assets: Traditional Finance as a Steady Anchor Amid the AI IPO Boom

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of AI (AI) are presented below.