AI Giants Enter the Dark Forest

marsbitPubblicato 2026-04-25Pubblicato ultima volta 2026-04-25

Introduzione

In the AI industry's "dark forest," major players like Anthropic, OpenAI, and DeepSeek are strategically withholding their most advanced models to avoid becoming targets in a high-stakes competitive landscape. Anthropic released Claude Opus 4.7 but admitted it underperforms compared to their unreleased model Mythos, citing safety concerns. They delayed addressing user complaints about performance regression until OpenAI’s GPT-5.5 launch, highlighting a tactic of controlled disclosure aligned with competitors’ moves. OpenAI’s GPT-5.5, though a full retrain since GPT-4.5, was seen as incremental rather than revolutionary. Leaks revealed internal models like Glacier and Heisenberg, indicating significant unreleased capabilities. OpenAI acknowledges a "capability overhang," where real model power exceeds what users experience, often due to infrastructure-driven throttling. DeepSeek launched V4 Preview, a cost-efficient model, but its full potential (V4 Pro Max) awaits Huawei’s Ascend 950 super-nodes量产 in late 2026. Their strategy focuses on affordability and scalability, aiming to democratize AI access globally, a move noted even by NVIDIA’s CEO as a disruptive threat. Together, these actions reflect a broader trend: leading AI labs are deliberately pacing releases, hiding strengths, and aligning disclosures with competitive dynamics—each avoiding the risk of exposure in a forest where first movers become targets.

By | Xiang Xianzhi

In "The Three-Body Problem," Liu Cixin wrote an image that has been cited countless times—the dark forest. Every civilization is a hunter with a gun; whoever exposes themselves first dies first. The forest is not empty—it's that everyone knows turning on a light will attract bullets, so everyone keeps their lights off.

In the spring of 2026, top AI labs entered such a dark forest.

On April 16, Anthropic was the first to release Claude Opus 4.7. On the same day, they made an unusual move—publicly admitting that Opus 4.7's performance was not as good as an unreleased model called Mythos, citing safety concerns.

On April 23, OpenAI posted GPT-5.5 on its official website. On the same day, Anthropic published an incident review report titled "An update on recent Claude Code quality reports" on its official blog, admitting that Claude Code had indeed become dumber over the past month—one releasing a new card, the other admitting a mistake. But this "new king" was almost showing off: we admit Claude has temporarily become dumber—but don’t forget, we still have a Mythos card up our sleeve.

On April 24, the "mysterious Eastern force" DeepSeek V4 Preview was launched, with Liang Wenfeng's team officially announcing the model's deep integration with Huawei's Ascend 950PR for the first time; but everyone understood—the truly "full-blooded" V4 Pro Max would only be released after the mass production of Ascend 950 super nodes in the second half of the year.

Three companies, three moves. On the surface, they are各自的 product rhythms, but when pieced together, one thing becomes clear:

Each one holds at least one "gun"—a model stronger than the public version, a next-generation architecture not yet released to the public, or a super node of chips not yet widely deployed. But none dare to raise this gun first.

Because in this industry, the cost of "showing first" is far more than just leaking secrets. Showing first means handing your capability上限 to competitors as a reference;意味着率先承担安全审视、监管收紧、舆论压力的全部火力;意味着把自己变成下一轮所有竞对都要瞄准的那个移动靶子. There is no heroism in the forest—everyone who fires first becomes the next target.

So the most rational choice for hunters is to turn off the lights, hold their breath, and keep their weapons hidden behind their backs.

This is the optimal solution in game theory.

Anthropic's Fearlessness

On Claude's side, the past month has almost been the worst version release ever.

After早早 updating to Opus 4.7, Anthropic still dominated various charts, and they still had Mythos, which is only provided to enterprise customers—a seemingly unhurried attitude.

But the Opus 4.7 cycle was almost the worst user experience for Claude, with a flood of negative reviews.

In early March, Anthropic changed Claude Code's default reasoning depth from high to medium. The intention behind this decision was understandable: in high mode, the UI often appeared frozen, with responses so slow that paying users were frustrated. The problem was, they didn’t announce it at the time.

At the end of March, they launched an "efficiency optimization"—if a Claude Code session was idle for more than an hour, the system would clear old reasoning blocks. In design, this was to save computing power. In practice, after each round of conversation, Claude seemed to have amnesia, forgetting the context completely. The developer community was flooded with complaints during those weeks: "Claude no longer remembers what I asked it to do in the previous round."

Recently, a third thing happened—adding an instruction to compress verbosity in the system prompt. By Anthropic's own later admission, this instruction directly reduced Claude Code's coding quality by 3%.

These three things叠加在一起, led to an AMD senior director writing on GitHub: "Claude has regressed to the point it cannot be trusted to perform complex engineering." Axios' April 16 article, "Anthropic's AI downgrade stings power users," brought it into the mainstream spotlight.

Then Anthropic admitted that there were indeed some issues.

On April 7, they quietly rolled back the reasoning effort adjustment; on April 10, they fixed the cache bug; on April 20, they removed the system prompt compressing verbosity. But the real incident review report wasn’t released until April 23—coinciding with the day GPT-5.5 was publicly released.

This sense of slight contempt—"oh, there was a bug in our engineering strategy, it’s fixed now"—came just before and after OpenAI's heavyweight release. It’s hard to believe this was a coincidence.

What’s more intriguing is that when Opus 4.7 was released, Anthropic made an unusual move: publicly admitting that Opus 4.7's performance was不及 an unreleased model—Mythos. This was clearly a "strategic retreat"—Anthropic kept its strongest capabilities on the enterprise side and was in no hurry to release them to the public because the team wasn’t ready to release Mythos.

This explanation is believable. But from a business narrative perspective, the other half is equally true: Anthropic waited six weeks to publicly admit that Claude Code was regressing, and only brought up the issue on the day OpenAI was about to play a new card. If not for sufficient competitive pressure, if Opus 4.7 hadn’t proven "we still have a backup," this statement might never have come.

On Claude's side, squeezing toothpaste doesn’t mean deliberately crippling capabilities; it means: the pace of capability release and the timing of issue disclosure are both aligned with competitors' rhythms.

Releasing your most advanced capabilities will inevitably make you a target. Or, from Anthropic's perspective, the pressure from 4.6 on competitors hasn’t faded—so there’s no need to play the stronger card now.

OpenAI's Old Tricks

If Anthropic is "hiding a Mythos and not releasing it," then OpenAI's toothpaste-squeezing is more subtle—it leaves the power of capability release in the load curve of its own servers and a tiering mechanism called auto-router.

On April 23, the same day GPT-5.5 was released, Simon Willison (co-creator of the Django framework, well-known independent AI evaluator) wrote a cautious sentence in his blog: "It's not a dramatic departure from what we've had before."

He added a key piece of information: GPT-5.5 is the first completely retrained base model since GPT-4.5; that is, the past half-year's releases of 5.1, 5.2, 5.3, and 5.4 were all just incremental updates. In other words, OpenAI released the past four minor updates with restrained effort—because they didn’t know what competitors would release.

"Releasing with restrained effort" has a more understandable name: squeezing toothpaste.

But a more memorable scene occurred hours after GPT-5.5 went live. Codex users filed Issue #19241 on GitHub, complaining that Fast mode was initially very fast but became visibly slower as more users were let in, while billing was still at the Fast tier. The wording was familiar: "Please investigate whether GPT-5.5 Fast mode is downgraded under high load."

This was almost an exact replay of the scene on August 7, 2025, the day GPT-5 was first released—back then, Reddit r/ChatGPT pushed "GPT-5 is horrible" to 4600+ upvotes, and Sam Altman personally admitted in an AMA the next day that "the autoswitcher broke... GPT-5 seemed way dumber"—admitting that the router had downgraded users behind the scenes.

The same script was上演 again eight months later.

More ironically, the day before GPT-5.5's official release, OpenAI's Codex mistakenly pushed the internal staging environment to production, captured by several Pro users, fixed within minutes, but the content had already spread. What appeared in the selector at that time, besides GPT-5.5 itself, was a series called Glacier (tooltip reading "Intelligence that moves continents"), a life sciences model called Heisenberg, an unknown-purpose model called Arcanine, and multiple versions with codenames like oai-2.1.

That is, at the same time OpenAI released GPT-5.5 as the "next generation," internally there were at least 5 to 6 parallel product lines running, none of which had reached the public yet.

OpenAI itself admitted it. In its official 2026 roadmap, they used a term long discussed in academic circles—capability overhang—admitting that there is a huge gap between the true capabilities of current large models and the effects users can actually achieve.

Sound familiar? It’s almost the same wording Anthropic used for Mythos. Even if the Codex leak on April 22 was truly a mistake, OpenAI actively putting the term "capability overhang" into its roadmap sends a clear message—we have much more in hand, you deal with it.

You can only squeeze if you have far more than what you sell to users. The 24 hours of GPT-5.5 turned this premise into a live broadcast once again.

DeepSeek's Patient Wait

On DeepSeek's side, the way of "squeezing" has completely changed—it is not hiding capabilities but waiting for a more suitable delivery time.

1.6T MoE, 1M context, Pro/Flash dual specifications, priced at 3.48 per 1M tokens—dozens of times cheaper than GPT-5.5, an order of magnitude difference from Opus 4.7. Overseas independent evaluators concluded with two sentences: performance is close to but slightly lower than GPT-5.4 / Gemini 3.1-Pro, price "shatters the economics of frontier labs."

But in DeepSeek's own coordinate system, V4 Preview is already significantly more expensive than V3's "bizarrely cheap" price. Everyone knows—this is not the full-blooded version.

The complete story of DeepSeek V4 does not end with its release, nor does it start with it.

It starts with the unreleased release of R2 in 2025. R2 was originally scheduled for release in May 2025 but was eventually postponed to autumn/winter. DeepSeek's entire infrastructure in China migrated to Huawei's CANN ecosystem. For any lab, this is not an engineering feat that can be completed in a quarter—compiler, operators, communication libraries, inference framework, MoE routing, all had to be rewritten.

And this time with V4, it is the first time DeepSeek officially wrote Ascend into the training hardware list. V4 is the first version of mixed training—Ascend's first entry.

But the next-generation chip Ascend 950DT, optimized for large-scale training, is scheduled for mass production in Q4 2026 according to Huawei's roadmap. That is, V4 training was able to run by拼凑上一代的 950PR; to make the full-blooded version like V4 Pro Max, a 1.6T MoE model, both fully trainable and deployable at scale, we must wait for the next generation to arrive.

The real engineering challenge is not "whether V4 can be trained"—it has been trained—but "how to make V4 run fully, stably, and cheaply on Ascend."

Ascend 950PR was mass-produced in Q1 2026, with FP4 computing power of 1.56 PFLOPS, on-chip memory of 112GB, paper specifications对标 and exceeding NVIDIA H20. But from a single chip running, to a whole super node stably serving millions of tokens/second inference requests, are two different things. The full-blooded version of V4 Pro Max is locked to this "super node"—the large-scale cluster version of the Ascend 950 series, which will be available in the second half of 2026.

This constitutes a strategy completely different from the previous two. Anthropic and OpenAI's logic for squeezing toothpaste is: I have something stronger, I won’t give it to you yet; DeepSeek's logic for squeezing toothpaste is: my full-blooded version must wait for a moment when the price can drop another notch.

This difference is important.

DeepSeek's real killer feature has never been "the most cutting-edge performance," but "with adequate performance, cutting the token price to a level others dare not." V4 Preview has been adapted to run on NVIDIA cards and Ascend 950PR, but to achieve full-blooded inference at production scale, we must wait for the super nodes to arrive. Once that moment comes, two things will happen simultaneously: first, V4 Pro Max's capabilities can be released to the maximum; second, inference costs and API pricing will drop another level—for a company that relies on price to break through the market, the latter is more致命 than the former.

What people truly expected, the "DeepSeek moment" that happened in early 2025, did not happen again in this release. And the release of V4 Preview is actually a trailer; the real highlight is the "DeepSeek + Huawei Ascend" moment in the second half of the year.

From this perspective, what Liang Wenfeng's team is doing now is not被迫 "hiding," but a commercially restrained "choice"—choosing to hand the premiere of the strongest version to a scenario where it has the most say: the first day after the large-scale deployment of domestic super nodes. Before that, use V4 Preview to consolidate the narrative of cost-effectiveness for another round.

What DeepSeek carries has never been the "longboard narrative" of making domestic large models rank first on some chart, but the "systemic narrative" of simultaneously making chips, training, inference, and pricing work together—the latter is far more important than the former.

Just a few days ago, Jensen Huang said on Dwarkesh Patel's podcast that if DeepSeek premieres on Huawei chips, "that's a horrible outcome for our nation."

NVIDIA still controls the top computing power for now. But according to the "AI five-layer cake" that Jensen Huang himself proposed—energy, chips, infrastructure, models, applications—the domestic large model industry already has workable domestic options at every layer, and the gap is narrowing at a visible speed. With the final piece of the chip puzzle in place, DeepSeek's open-source large model story becomes a bigger story than American large models: this is an important step towards global intelligence parity without excessive cost consumption.

Allowing the world to bypass certain advanced computing powers controlled by hegemony and enter an efficient intelligent society.

Epilogue

Anthropic's "hiding"—is active. They have Mythos, didn’t release it, citing safety.

OpenAI's "hiding"—is structural. They have Pro tiers, don’t always give them to you, citing infrastructure and price tiers.

DeepSeek's "hiding"—is necessary. It concerns a whole set of narrative templates for societal intelligence leap.

But from another perspective, this is exactly like the dark forest depicted by Liu Cixin: in this dark forest of intelligence, no top hunter will fire the first shot.

Exposure means having no reservations,意味着没有底牌, and becoming a live target for another hunter.

No one knows who will fire the most致命 shot first. But one thing is certain: every model you use today is not its true form.

Domande pertinenti

QWhat is the 'Dark Forest' metaphor used to describe in the context of top AI labs in 2026?

AThe 'Dark Forest' metaphor, from Liu Cixin's 'The Three-Body Problem', describes a state where each top AI lab is like an armed hunter. Exposing one's full capabilities first makes them a target for competitors, leading to a strategic equilibrium where everyone hides their strongest models and advancements to avoid becoming a moving target for scrutiny, regulation, and competitive pressure.

QWhat was the strategic reason Anthropic gave for not releasing its most powerful model, Mythos, to the public?

AAnthropic cited 'safety concerns' as the official reason for not releasing its most powerful model, Mythos, to the public. Strategically, this also allowed them to retain their strongest capability as a competitive advantage, avoiding the pressure of being the first to set a new benchmark that others would aim to surpass.

QHow did OpenAI demonstrate the concept of 'capability overhang' with the release of GPT-5.5?

AOpenAI demonstrated 'capability overhang' by admitting that the GPT-5.5 release was their first fully retrained base model since GPT-4.5, revealing that the previous minor versions (5.1 to 5.4) were only incremental updates. Furthermore, a leak showed they had multiple other advanced, unreleased models in development (e.g., Glacier, Heisenberg, proving they possess far more advanced capabilities than what is currently available to users.

QWhy is DeepSeek's V4 Preview not considered its 'full-blooded' version, and what are they waiting for?

ADeepSeek's V4 Preview is not the 'full-blooded' version because its training and current operation rely on the previous-generation Ascend 950PR chips. The company is waiting for the mass production and deployment of the next-generation Ascend 950DT super nodes in the year's second half. This will allow the 'V4 Pro Max' version to run at full capacity and enable a further drastic reduction in inference costs, which is core to DeepSeek's market strategy.

QWhat common strategic behavior did all three AI labs (Anthropic, OpenAI, and DeepSeek) exhibit, according to the article?

AAll three labs exhibited the strategic behavior of 'withholding' or 'squeezing' their full capabilities. They each possess more advanced technology—be it a stronger model, a next-gen architecture, or more efficient hardware—than what they have released to the public. None are willing to be the first to fully reveal their hand, as it would make them a target for competitors and regulators in the 'Dark Forest' of AI competition.

Letture associate

Can DeepSeek Save China One Trillion Dollars?

"DeepSeek and the $1 Trillion Infrastructure Question" The article examines whether DeepSeek's AI optimization breakthroughs could potentially save China $1 trillion in future AI infrastructure costs. The analysis begins with Nvidia's upcoming Vera Rubin AI platform, costing ~$7.8 million, where memory (HBM4/LPDDR5X) constitutes $2 million—a 435% cost increase in one year, highlighting how AI hardware spending is shifting toward expensive memory components. DeepSeek's approach works in the opposite direction. Through three key technical innovations showcased in DeepSeek V4, the company dramatically improves hardware efficiency: 1. **Memory Compression (MLA)**: Re-engineers the attention mechanism to compress long-context memory (KV Cache) by over 90%, drastically reducing expensive HBM usage. 2. **Selective Activation (MoE)**: Employs Mixture-of-Experts architecture where only a small fraction of parameters (e.g., 49B out of 1.6T in V4-Pro) are activated per token, allowing most parameters to reside in cheaper memory/SSD. 3. **Computation Caching**: Reuses previously computed results via cache hits, replacing expensive GPU computations with cheap memory reads. Combined, these optimizations allow the same hardware to produce approximately 4x more tokens, effectively reducing required hardware investment by 75%. DeepSeek's pricing reflects this: a 10-billion token workload costs ~$522 monthly versus ~$9,000-$10,000 for competitors. The $1 trillion savings projection stems from McKinsey's estimate that global AI infrastructure will require ~$5.2 trillion investment by 2030. As China's daily token consumption grows toward quadrillions, even marginal efficiency gains scale massively. With a conservative 4x throughput improvement, China could avoid building tens of thousands of AI data centers equivalent to ~7 trillion RMB ($1 trillion) in saved investment. Critically, this strategy shifts dependency from scarce, expensive GPU/HBM—where China lags—toward more accessible storage, caching, and systems engineering where domestic suppliers like CXMT are gaining strength. Rather than "replacing Nvidia," DeepSeek rebalances AI's value chain away from monolithic hardware dependency. Ultimately, DeepSeek's technical breakthroughs could lower the barrier to AI adoption across Chinese industries by making advanced capabilities affordable at scale—transforming who can access next-generation AI.

marsbit42 min fa

Can DeepSeek Save China One Trillion Dollars?

marsbit42 min fa

Overturning the Mainstream Approach to Hallucinations: Metacognition is the New Solution for Large Models to Break the Hallucination Barrier

This paper, "Hallucinations Undermine Trust; Metacognition is a Way Forward," proposes a paradigm shift in combating AI hallucination. It argues that the current mainstream approaches—striving for omniscience by scaling data/models or having AI abstain from uncertain answers—are fundamentally flawed. The former has inevitable knowledge gaps, while the latter imposes a crippling "utility tax," requiring the rejection of many correct answers to achieve high accuracy, due to models' poor "discrimination" (the ability to distinguish correct from incorrect answers internally). The core contribution is redefining hallucination not as "being wrong," but as "expressing false information with unwarranted certainty." The proposed solution is **Faithful Uncertainty** or **Metacognition**: enabling AI to accurately perceive its internal uncertainty and honestly express it in its language (e.g., using hedging phrases when unsure). This creates a more reliable assistant that provides useful information while signaling its confidence, minimizing harm from errors. The paper emphasizes that metacognition is critical for the era of AI Agents. Without it, Agents cannot intelligently decide when to use tools like search engines, leading to inefficiency and misuse. Key implementation challenges are highlighted: the "bootstrapping paradox" of training with static uncertainty data, the "alignment distortion signal" where human preference training suppresses internal uncertainty cues, and the difficulty of causally evaluating true metacognition vs. its superficial imitation. The paper concludes that the goal should not be an infallible AI, but one that is honest about the limits of its knowledge, thereby building user trust through transparent communication of its certainty.

marsbit46 min fa

Overturning the Mainstream Approach to Hallucinations: Metacognition is the New Solution for Large Models to Break the Hallucination Barrier

marsbit46 min fa

Hedge by Buying Gold and Oil, Chase Soaring Returns with AI. ‘Dated’ Bitcoin Enters a Bear Market

Bitcoin has recently declined, hitting a two-month low near $66,123, while Ethereum fell to a three-month low around $1,837. Analysts suggest the drop is not merely due to factors like ETF outflows or MicroStrategy's selling but reflects a deeper issue: Bitcoin is losing a broader asset competition. In a near-zero interest rate environment, Bitcoin previously thrived as an outlet for investor dissatisfaction with inflation and limited options. However, the market landscape has shifted. Bitcoin now occupies an "awkward middle ground," facing competition on three fronts. For inflation hedging, investors prefer gold, energy stocks, and commodity producers—assets with tangible backing and clearer pricing power. For growth exposure, AI-related companies with actual revenues and profits are more attractive. Even within crypto, investors can choose stablecoins, exchanges, or infrastructure firms tied directly to adoption, offering clearer business models and leverage. Thus, Bitcoin is no longer the top choice for hedging, growth, or crypto exposure. This shift is evident in market reactions: despite recent warnings about persistent inflation from a Fed official, Bitcoin did not rally as it might have in the past. Instead, capital flowed to assets with direct commodity or energy exposure. The recent ETF outflows and MicroStrategy sales are symptoms, not causes, of this new reality. Investors are becoming more selective, demanding clearer value propositions beyond mere scarcity. The emerging bear case for Bitcoin is not about it being a bubble or failed technology, but that scarcity alone is no longer sufficient.

华尔街日报49 min fa

Hedge by Buying Gold and Oil, Chase Soaring Returns with AI. ‘Dated’ Bitcoin Enters a Bear Market

华尔街日报49 min fa

Trading

Spot
Futures
活动图片