Claude's Intelligence Decline: Suicide or Playing Dead?

marsbitPublished on 2026-04-13Last updated on 2026-04-13

Abstract

Recent reports indicate that Claude Opus 4.6, Anthropic's flagship model, has experienced a significant decline in performance, with its global ranking dropping from 2nd to 10th on BridgeBench. Accuracy fell sharply from 83.3% to 68.3%, while the hallucination rate nearly doubled. Users expressed frustration over the model's reduced capability for complex tasks. However, leaked internal screenshots suggest Anthropic is shifting focus toward a broader strategy: developing Claude Projects, a full-stack application builder. This platform allows users to create functional applications—such as AI chatbots, interactive games, and SaaS dashboards—with minimal coding, potentially making traditional programming obsolete. Anthropic’s move appears to prioritize platform ecosystem development over model leaderboard rankings. With annual revenue reaching $30 billion, largely from API usage, the company aims to reduce dependency on commoditized AI models by creating a sticky, integrated environment where users build and deploy directly within its ecosystem. This strategic pivot reflects a broader industry trend where becoming an indispensable infrastructure matters more than having the highest-performing model in benchmarks.

【Introduction】Global No.2 Drops to No.10: Claude's Strongest Model Exposed for "Intelligence Decline," BridgeBench Confirms It! But Anthropic Doesn't Seem to Care?

Is Anthropic Finished?

Recently, AMD's AI Director confirmed Claude Code's intelligence decline, stating bluntly that it is "no longer usable for complex tasks."

Now, the latest report from the BridgeBench evaluation has delivered another heavy blow to Anthropic!

The data is staggering: Claude Opus 4.6's global ranking plummeted vertically from 2nd place to 10th place:

Accuracy dropped precipitously from 83.3% to 68.3%, and the hallucination rate nearly doubled, increasing by 98%.

At that moment, Claude's intelligence declined, it became dumber, the user experience worsened—the cold, hard numbers ended all user doubts—

It wasn't their problem; Claude Opus 4.6 had indeed gotten worse!

Claude users felt cheated!

Imagine if you relied on this model for any critical task, and they could directly replace it with a much worse model without informing you.

But users questioned: "How can this possibly be legal?" Trust began to crumble, ridicule towards Anthropic flooded the internet, and even the most loyal supporters began to waver.

But just as the entire internet was mocking them, Anthropic's trump card emerged—a suspected screenshot of an internal tool interface was leaked.

What the image showed instantly made all discussions about "Claude getting dumb" irrelevant—Claude Projects is testing a complete full-stack application building system.

Not helping you write code, but helping you build products.

While everyone was arguing over model scores, Anthropic had already changed the game table.

What's Hidden in the Leaked Image?

First, let's talk about what exactly that screenshot captured.

According to cross-verification from multiple sources, the leaked image shows an "one-click development kit" being tested internally by Claude Projects.

The interface clearly lists a row of pre-built templates: AI chatbot, interactive mini-game, business landing page, SaaS data dashboard... covering almost all the high-frequency demand scenarios for independent developers.

But the templates are just the surface.

What truly makes one gasp is the full-stack capability chain behind the templates—

Authentication? Check and configure.

Database? Select and build.

Front-end interface? Describe and generate.

Deployment and launch? One-click搞定 (get it done).

This is not "AI-assisted programming." This is "AI-replacing programming," and it doesn't even need to distill your skills anymore.

Understanding the weight of this statement requires看清 (seeing clearly) the current landscape of AI programming tools.

  • Cursor's logic is "making you code faster in the IDE"—it optimizes coding speed, the programmer is still the protagonist.
  • Replit's logic is "enabling those who can't code to code"—it lowers the entry barrier, but you still need to understand code logic.
  • Vercel's logic is "making deployment feel seamless"—it solves the last mile, but you have to walk the previous road yourself.

They each tackle one环节 (link) in the software development chain, each achieving极致 (the ultimate).

But what Claude wants to do is on a completely different dimension from them.

Cursor makes programmers 10 times faster, Replit lets non-programmers code—but Claude wants to make "coding" itself obsolete.

The former is an efficiency revolution, the latter is category elimination.

According to leaked information, the underlying engine powering this system is precisely Opus 4.6—the model being mocked across the internet for "intelligence decline."

Mythos "Not Strong Enough" Might Be Intentional?

The most core, and perhaps most controversial, judgment might be—

Anthropic might not care at all where Mythos ranks on the leaderboard.

Does that sound like making excuses for the loser? Let's do the math.

When your strategic endgame is to become a "full-stack application platform," the role played by the model layer changes fundamentally.

It no longer needs to be "the smartest," it only needs to be "good enough."

The key to winning platform competition has never been about the horsepower of the underlying engine, but about the depth of stickiness in the upper-layer ecosystem.

Windows beat Mac not because the OS was more elegant, but because the software ecosystem was richer. Android crushed Windows Phone not because the kernel was more advanced, but because there were more developers.

In platform wars, "the best" is never the reason for winning; "the most used" is.

In public, Dario Amodei has repeatedly said one thing: "Coding will die."

But the leak of the full-stack builder gives this statement product-level physical evidence for the first time.

Dario wasn't making a prophecy. He was describing a roadmap being executed.

If this reasoning holds, then Mythos leading GPT-5.4 Pro (no tools 56.8 vs 42.7) on HLE, but being caught up on GPQA (94.4 vs 94.5) and overtaken on BrowseComp (89.3 vs 86.9)—the meaning of these data points becomes completely different.

It's not that "Anthropic lost," but that "Anthropic selectively stopped focusing effort here."

Should limited computing resources be invested into the leaderboard arms race to maintain an illusory "No.1" label, or should they be倾斜 (tilted) towards full-stack builders that can directly create commercial value?

For a company with annual revenue of $30 billion that needs to prove its commercialization capability to investors, the choice isn't difficult.

The model just needs to be good enough; platform lock-in is the moat.

The残酷真相 (cruel truth) of business competition is: users don't care if your GPQA score is 94.4 or 94.5; users care about "I say a sentence, can the App run?"

Fear After $30 Billion in Annual Revenue

Anthropic's annualized revenue just broke through $30 billion, surpassing OpenAI.

Anthropic's annualized revenue grew from $1 billion to $30 billion in 15 months

This is a number that would make any startup pop champagne.

But if you are Dario Amodei, your primary emotion right now isn't celebration, but fear.

Because the vast majority of this $30 billion comes from API calls. And APIs are essentially an extremely dangerous business model.

Why? Because APIs mean your customers are using your capabilities to build their own products.

Today they call Claude's API to build an AI customer service platform, tomorrow they build an AI writing tool, the day after they build an AI programming assistant.

Every successful customer is building their own skyscraper on your foundation. It sounds beautiful—until one day, another model company offers a cheaper, similarly usable API, and your customers collectively migrate overnight.

This is the "model commoditization" nightmare: when the differences at the model layer become smaller and smaller, API pricing becomes a price war with no winners.

OpenAI feels this fear, so it's frantically making C-end products (consumer products)—ChatGPT, GPTs, custom assistants. Google feels this fear, so it's stuffing Gemini into search, email, docs, and every one of its own products.

They are all doing the same thing: before models become as cheap as cabbage, turn themselves into a platform users cannot leave.

Anthropic's full-stack builder is the most radical version of this same logic.

Its subtext is:

Rather than wait for others to build a platform on top of my API, and then wait for the day the model price drops to kick me away—I'll build the platform myself first.

You don't need to call my API anymore; you can build Apps directly on my platform. Your user data is here, your workflow is here, your deployment environment is here. By then, if you want to change models? Sure, but your entire business has to start over.

This isn't product innovation; it's survival instinct.

The $30 billion in revenue proves Anthropic can make money, but the leak exposes Anthropic's true anxiety—just making money isn't enough; you have to make others离不开你 (unable to leave you).

Conclusion: The Starry Sky and the Illusion

Let's step back from the business narrative and return to the origin of technical judgment.

The current top large models—whether Claude, GPT, or Gemini—are operating at about a 70% capability level. The climbing speed of this number in the past half year has visibly slowed down.

Moving from 70% to 100% doesn't rely on leaderboard grinding, nor on gaining a few more percentage points on the GPQA score. It relies on becoming an irreplaceable infrastructure—like the power grid, you don't care what turbine the power plant uses, you just know the light turns on when you flip the switch, the AC cools when you turn it on.

Anthropic's full-stack builder is the first time we've seen an AI company seriously thinking about this path of "infrastructuralization."

No longer obsessed with the虚荣战争 (vanity war) of "my model is 0.1 points smarter than yours," but directly answering a more fundamental question: How can I get a billion people to use my stuff every day, without even realizing it?

Because what ultimately decides the AI endgame is never whose exam score is higher. It's who becomes the power grid that everyone cannot live without first.

References:

https://x.com/cryptopunk7213/status/2043405326196867127

https://x.com/iruletheworldmo/status/2043332977136975994

https://x.com/marmaduke091/status/2043382991901147158

This article is from the WeChat public account "新智元" (New Wisdom Element), edited by: KingHZ

Related Questions

QWhat significant performance drop did Claude Opus 4.6 experience according to BridgeBench's report?

AClaude Opus 4.6's global ranking dropped from 2nd to 10th place, with its accuracy plummeting from 83.3% to 68.3% and its hallucination rate nearly doubling, increasing by 98%.

QWhat major new capability was revealed by the leaked internal tool screenshot from Anthropic?

AThe leaked screenshot revealed Claude Projects, a full-stack application building system described as a 'one-click development kit' capable of generating complete applications from templates, handling authentication, database setup, front-end generation, and deployment.

QAccording to the article, why might Anthropic be intentionally deprioritizing performance on benchmark leaderboards?

AThe article suggests Anthropic may be strategically shifting its limited computing resources away from benchmark competition to focus on developing its full-stack builder platform, prioritizing platform lock-in and commercial viability over having the 'smartest' model.

QWhat fundamental shift in business strategy does the Claude Projects platform represent for Anthropic?

AIt represents a shift from providing an API service, which is vulnerable to commoditization and price competition, to becoming a full-stack platform that hosts entire development workflows, creating deeper customer lock-in and making the underlying model less replaceable.

QWhat is the article's perspective on the ultimate determinant of success in the AI industry?

AThe article argues that ultimate success won't be determined by which model has slightly higher benchmark scores, but rather which company first becomes an indispensable infrastructure—like an electrical grid—that billions of people use daily without thinking about it.

Related Reads

Conversation with Patagon Founder: Revealing the Inside Story of Anthropic's Secondary Market

**Summary: Inside Anthropic's Massive, Opaque Secondary Market** In a revealing interview, Patagon founder Dio Casares pulls back the curtain on the booming, high-risk secondary market for shares in companies like Anthropic. This private market, fueled by companies staying private longer and massive funding rounds, is estimated to involve hundreds of billions of dollars. Casares distinguishes between two types of "secondary" trading: 1. **Company-approved SPV (Special Purpose Vehicle) sales:** Where new capital flows into the company, often facilitated by select private equity firms. Anthropic supports this to manage liquidity and pre-IPO selling pressure. 2. **The "gray" market:** Platforms like Hive and Forge that match buyers and sellers, often creating pricing confusion and competing with official funding rounds. These intermediaries are widely disliked by companies. The market structure is complex and fragmented, relying heavily on personal connections. Brokers connect buyers and sellers, often layering multiple SPVs to pool capital, with single transaction fees as high as 10%. Strikingly, some finance professionals earn more from this trading than from their primary investment roles. **Key risks highlighted include:** * **High Fraud Rates:** An estimated 10-20% of transactions involve fake stock certificates or sellers who take payment without having the shares. * **Complex, Risky Structures:** Nested SPVs, "forward contracts" on employee equity, and tokenized private equity create layers of opacity. This is exemplified by a recent incident where an xAI employee's shares were revoked after an espionage allegation, leaving buyers empty-handed. * **Post-IPO "Settlement Hell":** After an IPO, delays in distributing shares through multiple SPV layers and decisions by fund managers to hold onto shares could trigger years of lawsuits as downstream investors are locked out. **For small investors** holding positions through tokenized vehicles or layered SPVs, it's often impossible to verify the underlying asset. Casares advises caution: if the investment feels wrong, consider exiting. As the private market now surpasses IPO fundraising, this "wild west" ecosystem faces a looming reckoning. While it will likely professionalize, the post-IPO period for a company like Anthropic could unleash a wave of disputes, exposing the vulnerabilities built into this frenzied, largely unregulated marketplace.

marsbit1h ago

Conversation with Patagon Founder: Revealing the Inside Story of Anthropic's Secondary Market

marsbit1h ago

Trading

Spot
Futures

Hot Articles

Discussions

Welcome to the HTX Community. Here, you can stay informed about the latest platform developments and gain access to professional market insights. Users' opinions on the price of S (S) are presented below.

活动图片