Claude's Intelligence Decline: Suicide or Playing Dead?

marsbitОпубликовано 2026-04-13Обновлено 2026-04-13

Введение

Recent reports indicate that Claude Opus 4.6, Anthropic's flagship model, has experienced a significant decline in performance, with its global ranking dropping from 2nd to 10th on BridgeBench. Accuracy fell sharply from 83.3% to 68.3%, while the hallucination rate nearly doubled. Users expressed frustration over the model's reduced capability for complex tasks. However, leaked internal screenshots suggest Anthropic is shifting focus toward a broader strategy: developing Claude Projects, a full-stack application builder. This platform allows users to create functional applications—such as AI chatbots, interactive games, and SaaS dashboards—with minimal coding, potentially making traditional programming obsolete. Anthropic’s move appears to prioritize platform ecosystem development over model leaderboard rankings. With annual revenue reaching $30 billion, largely from API usage, the company aims to reduce dependency on commoditized AI models by creating a sticky, integrated environment where users build and deploy directly within its ecosystem. This strategic pivot reflects a broader industry trend where becoming an indispensable infrastructure matters more than having the highest-performing model in benchmarks.

【Introduction】Global No.2 Drops to No.10: Claude's Strongest Model Exposed for "Intelligence Decline," BridgeBench Confirms It! But Anthropic Doesn't Seem to Care?

Is Anthropic Finished?

Recently, AMD's AI Director confirmed Claude Code's intelligence decline, stating bluntly that it is "no longer usable for complex tasks."

Now, the latest report from the BridgeBench evaluation has delivered another heavy blow to Anthropic!

The data is staggering: Claude Opus 4.6's global ranking plummeted vertically from 2nd place to 10th place:

Accuracy dropped precipitously from 83.3% to 68.3%, and the hallucination rate nearly doubled, increasing by 98%.

At that moment, Claude's intelligence declined, it became dumber, the user experience worsened—the cold, hard numbers ended all user doubts—

It wasn't their problem; Claude Opus 4.6 had indeed gotten worse!

Claude users felt cheated!

Imagine if you relied on this model for any critical task, and they could directly replace it with a much worse model without informing you.

But users questioned: "How can this possibly be legal?" Trust began to crumble, ridicule towards Anthropic flooded the internet, and even the most loyal supporters began to waver.

But just as the entire internet was mocking them, Anthropic's trump card emerged—a suspected screenshot of an internal tool interface was leaked.

What the image showed instantly made all discussions about "Claude getting dumb" irrelevant—Claude Projects is testing a complete full-stack application building system.

Not helping you write code, but helping you build products.

While everyone was arguing over model scores, Anthropic had already changed the game table.

What's Hidden in the Leaked Image?

First, let's talk about what exactly that screenshot captured.

According to cross-verification from multiple sources, the leaked image shows an "one-click development kit" being tested internally by Claude Projects.

The interface clearly lists a row of pre-built templates: AI chatbot, interactive mini-game, business landing page, SaaS data dashboard... covering almost all the high-frequency demand scenarios for independent developers.

But the templates are just the surface.

What truly makes one gasp is the full-stack capability chain behind the templates—

Authentication? Check and configure.

Database? Select and build.

Front-end interface? Describe and generate.

Deployment and launch? One-click搞定 (get it done).

This is not "AI-assisted programming." This is "AI-replacing programming," and it doesn't even need to distill your skills anymore.

Understanding the weight of this statement requires看清 (seeing clearly) the current landscape of AI programming tools.

Cursor's logic is "making you code faster in the IDE"—it optimizes coding speed, the programmer is still the protagonist.
Replit's logic is "enabling those who can't code to code"—it lowers the entry barrier, but you still need to understand code logic.
Vercel's logic is "making deployment feel seamless"—it solves the last mile, but you have to walk the previous road yourself.

They each tackle one环节 (link) in the software development chain, each achieving极致 (the ultimate).

But what Claude wants to do is on a completely different dimension from them.

Cursor makes programmers 10 times faster, Replit lets non-programmers code—but Claude wants to make "coding" itself obsolete.

The former is an efficiency revolution, the latter is category elimination.

According to leaked information, the underlying engine powering this system is precisely Opus 4.6—the model being mocked across the internet for "intelligence decline."

Mythos "Not Strong Enough" Might Be Intentional?

The most core, and perhaps most controversial, judgment might be—

Anthropic might not care at all where Mythos ranks on the leaderboard.

Does that sound like making excuses for the loser? Let's do the math.

When your strategic endgame is to become a "full-stack application platform," the role played by the model layer changes fundamentally.

It no longer needs to be "the smartest," it only needs to be "good enough."

The key to winning platform competition has never been about the horsepower of the underlying engine, but about the depth of stickiness in the upper-layer ecosystem.

Windows beat Mac not because the OS was more elegant, but because the software ecosystem was richer. Android crushed Windows Phone not because the kernel was more advanced, but because there were more developers.

In platform wars, "the best" is never the reason for winning; "the most used" is.

In public, Dario Amodei has repeatedly said one thing: "Coding will die."

But the leak of the full-stack builder gives this statement product-level physical evidence for the first time.

Dario wasn't making a prophecy. He was describing a roadmap being executed.

If this reasoning holds, then Mythos leading GPT-5.4 Pro (no tools 56.8 vs 42.7) on HLE, but being caught up on GPQA (94.4 vs 94.5) and overtaken on BrowseComp (89.3 vs 86.9)—the meaning of these data points becomes completely different.

It's not that "Anthropic lost," but that "Anthropic selectively stopped focusing effort here."

Should limited computing resources be invested into the leaderboard arms race to maintain an illusory "No.1" label, or should they be倾斜 (tilted) towards full-stack builders that can directly create commercial value?

For a company with annual revenue of $30 billion that needs to prove its commercialization capability to investors, the choice isn't difficult.

The model just needs to be good enough; platform lock-in is the moat.

The残酷真相 (cruel truth) of business competition is: users don't care if your GPQA score is 94.4 or 94.5; users care about "I say a sentence, can the App run?"

Fear After $30 Billion in Annual Revenue

Anthropic's annualized revenue just broke through $30 billion, surpassing OpenAI.

Anthropic's annualized revenue grew from $1 billion to $30 billion in 15 months

This is a number that would make any startup pop champagne.

But if you are Dario Amodei, your primary emotion right now isn't celebration, but fear.

Because the vast majority of this $30 billion comes from API calls. And APIs are essentially an extremely dangerous business model.

Why? Because APIs mean your customers are using your capabilities to build their own products.

Today they call Claude's API to build an AI customer service platform, tomorrow they build an AI writing tool, the day after they build an AI programming assistant.

Every successful customer is building their own skyscraper on your foundation. It sounds beautiful—until one day, another model company offers a cheaper, similarly usable API, and your customers collectively migrate overnight.

This is the "model commoditization" nightmare: when the differences at the model layer become smaller and smaller, API pricing becomes a price war with no winners.

OpenAI feels this fear, so it's frantically making C-end products (consumer products)—ChatGPT, GPTs, custom assistants. Google feels this fear, so it's stuffing Gemini into search, email, docs, and every one of its own products.

They are all doing the same thing: before models become as cheap as cabbage, turn themselves into a platform users cannot leave.

Anthropic's full-stack builder is the most radical version of this same logic.

Its subtext is:

Rather than wait for others to build a platform on top of my API, and then wait for the day the model price drops to kick me away—I'll build the platform myself first.

You don't need to call my API anymore; you can build Apps directly on my platform. Your user data is here, your workflow is here, your deployment environment is here. By then, if you want to change models? Sure, but your entire business has to start over.

This isn't product innovation; it's survival instinct.

The $30 billion in revenue proves Anthropic can make money, but the leak exposes Anthropic's true anxiety—just making money isn't enough; you have to make others离不开你 (unable to leave you).

Conclusion: The Starry Sky and the Illusion

Let's step back from the business narrative and return to the origin of technical judgment.

The current top large models—whether Claude, GPT, or Gemini—are operating at about a 70% capability level. The climbing speed of this number in the past half year has visibly slowed down.

Moving from 70% to 100% doesn't rely on leaderboard grinding, nor on gaining a few more percentage points on the GPQA score. It relies on becoming an irreplaceable infrastructure—like the power grid, you don't care what turbine the power plant uses, you just know the light turns on when you flip the switch, the AC cools when you turn it on.

Anthropic's full-stack builder is the first time we've seen an AI company seriously thinking about this path of "infrastructuralization."

No longer obsessed with the虚荣战争 (vanity war) of "my model is 0.1 points smarter than yours," but directly answering a more fundamental question: How can I get a billion people to use my stuff every day, without even realizing it?

Because what ultimately decides the AI endgame is never whose exam score is higher. It's who becomes the power grid that everyone cannot live without first.

References:

https://x.com/cryptopunk7213/status/2043405326196867127

https://x.com/iruletheworldmo/status/2043332977136975994

https://x.com/marmaduke091/status/2043382991901147158

This article is from the WeChat public account "新智元" (New Wisdom Element), edited by: KingHZ

Связанные с этим вопросы

QWhat significant performance drop did Claude Opus 4.6 experience according to BridgeBench's report?

AClaude Opus 4.6's global ranking dropped from 2nd to 10th place, with its accuracy plummeting from 83.3% to 68.3% and its hallucination rate nearly doubling, increasing by 98%.

QWhat major new capability was revealed by the leaked internal tool screenshot from Anthropic?

AThe leaked screenshot revealed Claude Projects, a full-stack application building system described as a 'one-click development kit' capable of generating complete applications from templates, handling authentication, database setup, front-end generation, and deployment.

QAccording to the article, why might Anthropic be intentionally deprioritizing performance on benchmark leaderboards?

AThe article suggests Anthropic may be strategically shifting its limited computing resources away from benchmark competition to focus on developing its full-stack builder platform, prioritizing platform lock-in and commercial viability over having the 'smartest' model.

QWhat fundamental shift in business strategy does the Claude Projects platform represent for Anthropic?

AIt represents a shift from providing an API service, which is vulnerable to commoditization and price competition, to becoming a full-stack platform that hosts entire development workflows, creating deeper customer lock-in and making the underlying model less replaceable.

QWhat is the article's perspective on the ultimate determinant of success in the AI industry?

AThe article argues that ultimate success won't be determined by which model has slightly higher benchmark scores, but rather which company first becomes an indispensable infrastructure—like an electrical grid—that billions of people use daily without thinking about it.

Похожее

Bitcoin Price Could See Another Crash, But What Is The Long-Term Prognosis?

Bitcoin recently surpassed $78,000, reigniting bullish sentiment and expectations of reaching six figures. However, analyst Behdark warns of a potential price crash before any sustained recovery. He suggests current optimism may be misleading, possibly a tactic by market makers to attract buyers before a downturn. Technical analysis indicates bearish patterns, such as a triangle or diamagnetic formation, signaling a likely decline. Key resistance levels to watch are $77,000 and $80,552, where a reversal may occur. If a correction happens, support levels are identified near $72,800, $67,885, and $67,677, with a break below possibly leading to a further 10% drop. The long-term outlook remains above the $60,000 cycle support, suggesting eventual recovery after a short-term dip.

bitcoinist40 мин. назад

Bitcoin Price Could See Another Crash, But What Is The Long-Term Prognosis?

bitcoinist40 мин. назад

Prediction Markets Under Bias

The article "Prediction Markets Under Bias" by Jeff Park of Bitwise argues against the common media portrayal of prediction markets as mere gambling or a social ill. It distinguishes between gambling and investment based on whether a participant's strategy has a positive expected value (+EV), not the market's structure. The author contends that prediction markets, like poker, are skill-based and can be a legitimate form of investment, offering individuals autonomy, truth discovery, and decentralized value. The piece critiques the common conflation of prediction markets with casino gambling, highlighting their role in risk hedging and capital efficiency, similar to insurance and securitization. A key differentiator is their precision and finite expiration, which makes prices directly anchor to factual outcomes, rewarding deep research and information advantage rather than punishing the uninformed. The author concludes that media opposition often stems from a structural bias, as these markets challenge traditional information gatekeepers and promote a more democratic, transparent system for pricing truth. The real debate is not about information having a price, but about who controls and profits from it.

marsbit1 ч. назад

Why Do You Always Lose Money on Polymarket? Because You're Betting on News, While the Pros Read the Rules

Why do you always lose money on Polymarket? Because you bet on news, while the pros study the rules. This article explains how top traders ("che tou") profit by meticulously analyzing market rules, not just predicting events. Polymarket, a prediction market platform, often sees disputes over event outcomes due to ambiguous rule wording. For instance, a market asking "Who will be the leader of Venezuela by the end of 2026?" was misinterpreted by many who bet on Delcy Rodríguez, assuming she held power. However, the rules specified "officially holds" as the formally appointed, sworn-in individual. Since Nicolás Maduro was still recognized as president officially, he won the market—even being in prison. To resolve such disputes, Polymarket uses a decentralized arbitration system via UMA protocol. The process involves: 1. Proposal: Anyone can propose a market outcome by staking 750 USDC, earning 5 USDC if unchallenged. 2. Dispute: A 2-hour window allows challenges with a 750 USDC stake; successful challengers earn 250 USDC. 3. Discussion: A 48-hour period on UMA Discord for evidence and debate. 4. Voting: UMA token holders vote in two 24-hour phases (blind then public). Outcomes require >65% consensus and 5M tokens voted; otherwise, four re-votes occur before Polymarket intervention. 5. Settlement: Results are final and automatic. Unlike traditional courts, Polymarket’s system lacks separation between arbitrators and stakeholders—voters often hold market positions, creating conflicts of interest. This leads to herd mentality in discussions and non-transparent outcomes without explanatory rulings, preventing precedent formation. Thus, success on Polymarket hinges on deep rule interpretation, not just event prediction, exploiting gaps between reality and contractual wording.

marsbit1 ч. назад

Why Do You Always Lose Money on Polymarket? Because You're Betting on News, While the Pros Read the Rules

marsbit1 ч. назад

DeepSeek Funding: Liang Wenfeng's 'Realist' Pivot

DeepSeek, a leading Chinese AI company, has initiated its first external funding round, aiming to raise at least $300 million at a valuation of no less than $10 billion. This move marks a significant shift from its founder Liang Wenfeng’s previous idealistic stance of rejecting external capital to maintain independence. Despite strong financial backing from its parent company, quantitative trading firm幻方量化 (Huanfang Quant), which provided an estimated $700 million in revenue in 2025 alone, DeepSeek faces mounting challenges. Key issues include a 15-month gap in major model updates, delays in its flagship V4 release, and the loss of several core researchers to competitors offering significantly higher compensation. The company is also undergoing a strategic pivot by migrating its infrastructure from NVIDIA’s CUDA to Huawei’s Ascend platform, a move aligned with China’s push for technological self-reliance amid U.S. export controls. However, DeepSeek lags behind rivals like智谱AI and MiniMax—both now publicly listed—in areas such as product ecosystem, multimodal capabilities, and commercialization. The funding round, though relatively small in scale, is seen as a way to establish a market-validated valuation anchor, making employee stock options more competitive and facilitating talent retention. It also signals DeepSeek’s transition from a pure research-oriented organization to a commercially-driven player in the global AI ecosystem.

marsbit1 ч. назад

DeepSeek Funding: Liang Wenfeng's 'Realist' Pivot

marsbit1 ч. назад

Warsh, Trump's Next 'Scapegoat' at the Federal Reserve?

Kevin Warsh, a nominee for the Federal Reserve chair, faces a challenging confirmation hearing amid political pressure from former President Donald Trump to implement immediate and significant interest rate cuts. Warsh has proposed reforms including reducing the Fed’s communication frequency and shrinking its $6.7 trillion balance sheet. However, market expectations for rate cuts remain limited, creating tension with Trump’s demands. Warsh’s stance has shifted over time, from hawkish during the financial crisis to more dovish recently, citing AI-driven productivity gains. His confirmation is further complicated by Republican Senator Thom Tillis’s threat to block the nomination unless an investigation into current Chair Jerome Powell is dropped. Trump has warned of firing Powell if he doesn’t leave on time. Observers note that Warsh risks becoming a "scapegoat"—either by yielding to political pressure and fueling inflation, or by resisting and facing Trump’s backlash. His ties to Trump’s circle and Wall Street support may aid his nomination, but his ability to balance political demands with the Fed’s credibility remains uncertain. Historical precedents, like Arthur Burns’s compliance with Nixon, highlight the risks of political interference in monetary policy.

marsbit1 ч. назад

Warsh, Trump's Next 'Scapegoat' at the Federal Reserve?

marsbit1 ч. назад

Торговля

Спот

Фьючерсы

Обсуждения

Добро пожаловать в Сообщество HTX. Здесь вы сможете быть в курсе последних новостей о развитии платформы и получить доступ к профессиональной аналитической информации о рынке. Мнения пользователей о цене на S (S) представлены ниже.

Claude's Intelligence Decline: Suicide or Playing Dead?

Введение

What's Hidden in the Leaked Image?

Mythos "Not Strong Enough" Might Be Intentional?

Fear After $30 Billion in Annual Revenue

Conclusion: The Starry Sky and the Illusion

Связанные с этим вопросы

Похожее

Bitcoin Price Could See Another Crash, But What Is The Long-Term Prognosis?

Prediction Markets Under Bias

Why Do You Always Lose Money on Polymarket? Because You're Betting on News, While the Pros Read the Rules

DeepSeek Funding: Liang Wenfeng's 'Realist' Pivot

Warsh, Trump's Next 'Scapegoat' at the Federal Reserve?

Торговля

Популярные статьи

Как купить S

Sonic: Обновления под руководством Андре Кронье – новая звезда Layer-1 на фоне спада рынка

HTX Learn: Пройдите обучение по "Sonic" и разделите 1000 USDT

Обсуждения

Топ вопросы

Популярные категории

Популярные теги