Anthropic Cries Wolf: Is the AGI Threat Real, or Just an IPO Story?

marsbit發佈於 2026-06-05更新於 2026-06-05

文章摘要

Anthropic has published an article titled "When AI builds itself," discussing the emerging concept of "recursive self-improvement," where AI begins to actively participate in designing, training, testing, and optimizing its own subsequent versions. The company presents internal data showing that by May 2026, over 80% of code merged into its codebase was written by Claude, its AI model. Claude's capabilities have expanded to handling complex, open-ended engineering tasks, achieving a 76% success rate in such areas, and even contributing to research processes, such as optimizing code performance and conducting AI safety experiments. Anthropic outlines an evolution from human-driven development to AI-assisted workflows, culminating in the current stage where AI agents can autonomously write, run, and delegate code. The company cautions that the path toward a "closed loop," where AI continuously improves itself, is becoming visible. It calls for coordinated global mechanisms to potentially slow or pause frontier AI development to allow safety research and societal structures to catch up. However, the timing of this warning coincides with Anthropic's preparations for an IPO, framing the narrative not just as a safety concern but also as a demonstration of Claude's advanced capabilities and its integral role in accelerating Anthropic's own R&D—creating a potential "flywheel" effect for competitive advantage. This contrasts with OpenAI's recent, more policy-oriented discussion of ...

By | Alphabet AI

Anthropic published a lengthy article last night titled "When AI builds itself," which sounds like a science fiction novel by Asimov, and indeed deals with a sci-fi concept: recursive self-improvement.

Simply put, in the past, human researchers wrote code, ran experiments, and trained models to make AI stronger. But if AI starts to participate in designing, training, testing, and optimizing its own successors, then the speed of AI progress is no longer driven solely by humans—it may begin to "self-evolve."

To this end, Anthropic made a plea:

"We believe it would be beneficial for the world if there were an option to slow down or temporarily pause frontier AI development, allowing societal structures and alignment research to catch up with technological progress."

This statement sounds like a safety warning, but in the context of Anthropic preparing for an IPO, it's hard not to see it as another kind of narrative setup: Claude is so good, it's even starting to create the next generation of Claude itself.

A New Storm Has Emerged

To illustrate that AI is increasingly involved in AI research and development itself, Anthropic presented substantial internal data.

For instance, as of May 2026, over 80% of the code merged into Anthropic's codebase was written by Claude. Before the release of Claude Code, this number was only in the single digits.

By the second quarter of 2026, according to Anthropic's statistics, the daily volume of code merged by engineers was about 8 times higher than in 2024.

More notable than the code volume is that Claude is handling more open-ended engineering problems.

Anthropic stated in the article that over the past year, the frequency with which employees had to correct Claude, steer it back on track, or take over tasks mid-way has been steadily declining. This change is happening not only for simple tasks but also for the most complex, open-ended tasks.

So-called open-ended tasks are problems without clear instructions. For example, a system crash, a training task failure—issues where even engineers themselves don't know what the solution looks like initially and have to troubleshoot and make judgments on the fly.

These types of tasks historically relied most heavily on human experience. Yet, in those most open-ended tasks, Claude's success rate reached 76% by May 2026, a 50 percentage point increase within six months.

Not just writing code, Anthropic also uses Claude for code review—checking for bugs, security vulnerabilities, and other defects. Their retrospective analysis found that if every code change in the past had undergone automated review by Claude, approximately one-third of the bugs that caused incidents on claude.ai could have been caught before deployment.

Going a step further, Claude has begun to participate in the research process.

Anthropic has a standard test: give Claude code for training a small model and ask it to make the code run faster without altering the results. In May 2025, Claude Opus 4 could achieve about a 3x speedup; by April 2026, Claude Mythos Preview had pushed that number to approximately 52x.

Anthropic also mentioned an open-ended AI safety research case. They posed a question to a Claude-powered agent: Can a weaker model reliably supervise a stronger model?

This process involved proposing hypotheses, testing them, sharing findings with parallel agents, and iterating repeatedly.

Two human researchers spent a week bridging about 23% of the gap; Claude, with roughly 800 cumulative hours and about $18,000 in compute costs, bridged 97%.

This result certainly has limitations—the problem was chosen by humans, the scoring criteria were human-defined, and the findings haven't been fully migrated to production-scale models. But it still illustrates that Claude can now, within a research framework defined by humans, design experiments, execute them, and iterate on its own.

Furthermore, when human researchers "go down the wrong path," Claude can suggest a better next step.

Anthropic took 129 internal Claude Code research sessions where human researchers and Claude worked together on open-ended research problems. Anthropic identified points where "the human later proved to have taken a detour," gave the context up to that point to different versions of Claude, and asked it what it would suggest doing next. Then, another Claude judge, aware of the full session outcome, judged which was better: the model's suggestion or the human's choice at the time.

The results showed that at those points where the human researcher was later shown to have had room for improvement, Claude became increasingly able to propose a better next step.

In the past, AI model progress was primarily driven by human researchers and engineers. Humans decided what experiments to run, wrote the code, trained the models, and pushed forward AI's capabilities.

Now, more and more links in this chain are being taken over by Claude.

Anthropic presented a very intuitive stage diagram:

From 2021 to 2023, Anthropic was no different from a typical tech company—humans writing code and documentation on laptops.

From 2023 to 2025, chatbots began entering workflows. Engineers had models generate code snippets, then copied them into editors.

From 2025 to 2026, programming agents emerged. Claude began autonomously writing and modifying code, sometimes even completing entire files independently.

Today, agents can run code on their own and delegate hours-long work to other agents.

Looking ahead is the stage Anthropic is genuinely concerned about: the closed loop.

If this day arrives, subsequent versions of Claude might be continuously improved by Claude itself—this is recursive self-improvement.

Anthropic phrased it cautiously: we haven't reached that point yet, and recursive self-improvement isn't inevitable. But it still emphasizes that the path leading to that step is beginning to become visible.

That's why Anthropic discusses slowing down, even pausing, at the end of the article. Its meaning isn't that all AI companies should shut down immediately, but rather that if the risks of AI self-improvement continue to rise in the future, frontier labs need a coordinated, verifiable deceleration mechanism.

In other words, the "singularity" is approaching, and humanity must impose controls.

Unstoppable Claude

On the surface, this is a very forward-looking safety document. Anthropic is talking about recursive self-improvement, about AI potentially improving itself faster and faster, and about the need for human society to prepare deceleration and pause mechanisms in advance.

But placed in the context of Anthropic preparing for an IPO, this article takes on another layer of meaning.

In a way, Anthropic's recent moves resemble that annoyingly smug top student in class—it genuinely has the skills, but it's also quite pretentious.

What it wants to say isn't just "we have a very strong Claude"; a step beyond that, it wants to say "Claude is helping us build an even stronger Claude."

If Anthropic were merely selling a model or a tool, it would struggle to completely escape horizontal comparisons: Anthropic has Claude, OpenAI has GPT; Anthropic has Claude Code, OpenAI has Codex; Anthropic competes for enterprise clients, OpenAI competes for enterprise clients. The competition between the two companies is very tight, seeing who can tell the bigger story to the market.

It's worth noting that just three days ago, OpenAI wrote in a document about frontier AI governance:

"We are already seeing early signs of recursive self-improvement in today's systems: AI development itself is being accelerated by AI.

This will intensify competitive pressures among developers and nations, and create governance challenges that existing institutions are not equipped to handle."

Three days later, Anthropic says: The path for Claude towards recursive self-improvement is beginning to become visible.

If Claude develops as it hopes, this wouldn't be an ordinary product narrative—it would become a research and development flywheel.

Claude writes code, runs experiments, optimizes training processes, which in turn reduces incidents in Anthropic's own products… Once this system is up and running, Claude isn't just a product from Anthropic; it's a crucial production tool for Anthropic itself.

Users see the Claude product; enterprise customers buy Claude's capabilities. But what Anthropic truly wants the capital markets to notice is: Claude is already embedded in the underlying processes of frontier model development; it's been placed inside Anthropic's engine room.

Capital markets love flywheel stories, promising endless prosperity: A stronger Claude allows Anthropic's engineers to merge more code; more code enables faster product and infrastructure iteration; faster iteration allows researchers to run more experiments; more experiments in turn help the next generation of Claude become stronger. Once the next generation Claude is stronger, it continues to accelerate Anthropic's R&D.

Claude's iteration pace also supports this flywheel. Looking at public release timelines, from 2023 to early 2025, Claude's major model updates were mostly on a three-to-four month cycle. But with Claude 4, Anthropic's model updates have noticeably intensified.

Claude 4 was released in May 2025, Opus 4.1 in August, Sonnet 4.5 in September, Haiku 4.5 in October, Opus 4.5 in November.

In 2026, Opus 4.6 was released on February 5, Sonnet 4.6 on February 17, Opus 4.7 on April 15, and Opus 4.8 on May 28. The gap between Opus 4.7 and Opus 4.8 was only 42 days.

Anthropic, on the surface, is saying "this could be very dangerous, we need to prepare the brakes in advance," but it's simultaneously implying: "We've seen what happens when the accelerator is pressed."

The subtlety of the IPO narrative lies here. It describes the risks as significant while also elevating its own technological position.

Not every AI company is qualified to discuss recursive self-improvement. You first need to make the outside world believe your AI is already part of the AI R&D process to have the standing to say this might require global coordination.

OpenAI: How Could This Happen?

As mentioned earlier, just before Anthropic published this lengthy article, OpenAI had already put recursive self-improvement on the table.

But the two companies' narratives are quite different.

OpenAI's document, "Democratic Governance of Frontier AI," is a policy blueprint for Washington. It's concerned not with "how models get stronger," but with how to constrain frontier AI if it continues to surge ahead.

Most of the content in that report isn't suitable for detailed discussion here, but one key line stands out: OpenAI said that in today's systems, early signs of recursive self-improvement are already visible.

This line and Anthropic's lengthy article point in the same direction.

It's just that OpenAI talks about institutions, while Anthropic talks about itself.

OpenAI's point is: AI development is too fast; existing governance structures may not keep up, so a new set of rules is needed.

Anthropic directly showcases that system, telling the market: Claude is already in our R&D process, so we see the path to AI self-acceleration.

This move is quite clever. One imagines the grumbling inside OpenAI—this is practically idea theft! We were here first!

Just joking, but OpenAI really needs to step up its game and quickly bring GPT 5.6 to the table.

你可能也喜歡

特朗普的43分钟：强人叙事失控，媒体战升级

消失一周多后，美国总统特朗普重新公开露面，举行了一场43分钟的发布会。面对对其健康状况、伊朗军事行动及党内裂痕的质疑，他并未着力展示掌控力，反而使发布会偏离核心议题。他先花时间谈论国家广场倒影池改造，后又将自己集会人数与马丁·路德·金相比，并持续攻击记者、民主党人及多个美国城市，呈现焦躁且防御性强的形象。其间，他签署行政令，取消了约8000名高级联邦雇员的岗位保护，此举被指将削弱文官体系的专业性与独立性，使政府内部更强调对个人的忠诚。他对CNN记者的个人化攻击，以及文中提及的CBS等媒体面临的编辑独立危机，反映出政治权力与商业利益正对新闻业构成双重压力。文章认为，特朗普试图抹黑媒体以削弱公众对真相的信任。当主流媒体可能妥协时，独立记者和创作者成为维护公共事实的关键力量。文章最后指出，发布会当天，众议院有共和党人倒戈，通过了一项要求结束伊朗战争的决议，这显示特朗普的偏执与对“不忠”的无法容忍，正在使其失去部分党内保护，也构成了作者对美国制度韧性仍抱希望的依据。

marsbit1 小時前

marsbit1 小時前

Kalshi、MTS 与 a16z 的野望

本文探讨了预测市场在2025年成为投资热点的现象，并着重分析了其精神内核与风险投资机构a16z的新媒体战略之间的关联。文章梳理了预测市场理念的演变：从哈耶克关于市场作为信息协调机制的理论，到Robin Hanson设计的经济激励机制，再到“Futarchy”治理乌托邦的设想。然而，作者指出，这些传统讨论在a16z关注该领域后才被赋予新的意义。 a16z于2025年投资了预测市场平台Kalshi，并将其估值推高至220亿美元。其核心理念在于，预测市场为用户提供了对抗后现代疏离感的“在场感”。通过真金白银的下注，用户从被动观察者转变为能介入和影响事件的“超级观察者”，从而获得对事件真实性与重要性的解释权。这使其成为a16z构建新媒体帝国的关键拼图。文章以媒体公司MTS为例，说明a16z所倡导的“新媒体”是一种全频段、高强度的信息发布模式，旨在“接管时间线”。而Kalshi的独特价值在于，其市场交易数据凭借真实资金流动，具备了看似客观的权威性和强大的现实扭曲力场，能够影响公众认知与判断。这种能力正是其获得高估值的深层原因。

marsbit3 小時前

marsbit3 小時前

突发：OpenAI芯片元老加入Anthropic

OpenAI自研芯片团队早期核心成员Clive Chan宣布离职，并已正式加入竞争对手Anthropic。Clive Chan是OpenAI硬件团队的“002号员工”，全程参与了公司自研芯片项目从组建到推进的过程。他在声明中高度评价了OpenAI芯片团队的人才实力，但表示自己渴望“重新攀登一座新山”，因此选择加入Anthropic，并对Anthropic团队的人才、价值观和野心印象深刻。关于OpenAI的自研芯片进展，Clive Chan未透露更多细节，但提及了OpenAI与博通在2025年10月公布的合作计划。根据该计划，双方将共同建设总规模达10GW的AI加速器系统，首批机架预计在2026年下半年开始交付。 Clive Chan毕业于滑铁卢大学，曾先后在谷歌、SpaceX、特斯拉等公司从事AI基础设施相关工作，于2024年1月加入OpenAI。此次跳槽后，Anthropic内部员工表示了欢迎，而网友则调侃这像“离开皇马加盟巴萨”。近期，OpenAI与Anthropic之间人才流动频繁，此前OpenAI联合创始成员Andrej Karpathy也已加盟Anthropic。随着Anthropic近期完成巨额融资，估值逼近万亿美元，其与OpenAI在人才和资源上的竞争将持续受到关注。

marsbit3 小時前

marsbit3 小時前

a16z 全球化转向：VC 正在成为美国科技联盟的「推手」

a16z（Andreessen Horowitz）发布公告，宣布其全球化战略发生重要转向：不再局限于海外寻找项目和投资，而是将自身定位融入更大的技术竞争与国际盟友合作框架中。面对AI、机器人、国防科技等成为国家竞争焦点的领域，创业公司面临复杂的国际监管、产业政策和地缘关系。a16z通过设立东京办公室、任命Anne Neuberger负责全球事务、将投资者关系团队升级为全球合作伙伴团队等举措，主动应对这一变化。公告明确将a16z的全球网络与“美国及其盟友”的技术领导力绑定，标志着技术创新已进入国家安全和国际竞争语境。未来，风投的角色不仅是提供资本和增长建议，更要帮助创始人对接关键市场、政府机构和战略资源，理解多国政策环境。a16z旨在成为连接创业公司、国家能力、产业资源和全球资本的组织者，支持盟友国家在关键创新领域的合作，并助力投资组合公司进行全球扩张。这一布局体现了硅谷资本对全球科技竞争新格局的主动站位。

marsbit3 小時前

marsbit3 小時前

解读Agent商业、支付与基础设施的真相

作者基于一年来为Agent经济构建基础设施的经验，指出当前Agent商业尚未形成真实、规模化的市场需求，初创公司面临结构性挑战。文章分析了四个关键场景： 1. **Agent对商户**：目前电商体验中，聊天界面在视觉比价购物上逊于传统界面，商户接入多出于防御性“优化”心态。对话式商业在如外卖等高頻、低决策场景有潜力，但受限于平台开放性和成本。 2. **Agent对API**：开发者现有支付方式（如预付）已能处理低频、小额的API调用成本问题。真正的机会在于服务长尾、小众的供应商市场，但规模有限。 3. **Agent对Agent**：这是长期的愿景，涉及机器间的自动交易与结算，需求真实但当前市场几乎为零，需要专用的基础设施。 4. **Agent对金融**：这是唯一存在现成需求和付费客户的领域。将AI嵌入金融工作流是自然演进，但竞争激烈，老牌机构优势明显。文章认为，行业巨头因资金充足和战略防御而持续投入，但对初创公司而言，真正的机会并非单纯构建支付层。支付只是更宏大问题——**Agent与人类的协同工作、验证与结算**——的一部分。未来，解决协同问题的公司将主导市场，而非支付服务商。作者团队已转向一个存在真实需求、快速增长且未被充分服务的领域。

marsbit3 小時前

marsbit3 小時前

交易

現貨

合約

Anthropic Cries Wolf: Is the AGI Threat Real, or Just an IPO Story?

文章摘要

A New Storm Has Emerged

Unstoppable Claude

OpenAI: How Could This Happen?

相關問答

你可能也喜歡

特朗普的43分钟：强人叙事失控，媒体战升级

Kalshi、MTS 与 a16z 的野望

突发：OpenAI芯片元老加入Anthropic

a16z 全球化转向：VC 正在成为美国科技联盟的「推手」

解读Agent商业、支付与基础设施的真相

交易

熱門文章

如何購買NIGHT

相關討論

熱門問答

熱門分類

熱門標籤