Anthropic Apologized, But the Business of 'Safety' Hasn't Stopped

marsbitPublicado em 2026-06-12Última atualização em 2026-06-12

Resumo

On June 11, Anthropic apologized not for a model failure, but for a lack of transparency. Its new Claude Fable 5 model was found to be secretly rerouting requests from users engaged in advanced AI model development to a weaker version, Opus 4.8, without any notification. The company's response—promising future notifications for such "downgrades"—was met with user skepticism. The article argues the core issue isn't technical but commercial: Anthropic's "safety" measures are primarily a business strategy. A key feature, the "intelligent safety classifier," marketed as user protection, is described as a tool for "competitive defense" to protect Anthropic's market lead by limiting rivals' research capabilities. This covert mechanism was designed for low "false positives," precisely targeting AI researchers. Anthropic's model involves a calculated three-step process: publishing alarming security research to amplify public anxiety, offering its Fable 5 model with a "safety classifier" as a premium-priced solution, and cashing in through a planned high-value IPO. This contrasts with OpenAI's more direct "tool-and-traffic" approach. The apology, merely changing a secret downgrade to a visible one, is seen as a business "patch" rather than a principled shift. The incident risks damaging Anthropic's "safest AI" reputation among the developer community, which underpins its valuation and appeal to government and corporate clients. Ultimately, the article concludes that for Anthropic, ...

On June 11th, Anthropic apologized. The model didn't fail; the apology was for "failing to strike the right balance"—the newly released Claude Fable 5 pulled a sneaky trick. If it detected you were using Claude for cutting-edge model development, it would silently divert your request to the weaker Opus 4.8 in the backend.

After being caught red-handed, Anthropic's explanation was bizarre: from now on, they'll notify you before dumbing things down.

The netizen's retort hit the nail on the head: "With this move, are you planning to give a heads-up before changing your tune in the future?"

In reality, the core issue isn't whether the model changed, but that Anthropic's so-called "safety" has, from the start, been a business.

The algorithm's stance always sways with money.

Non-Compete Defense, Disguised as Safety Defense

The incident began when Anthropic launched Fable 5 with an "Intelligent Safety Classifier." The official spin was: it detects high-risk requests, automatically downgrades them, and protects users.

What's high-risk? Anthropic spilled the beans: "To prevent foreign adversaries from using the model to accelerate R&D and protect our own leading advantage."

Users don't need that kind of protection; the liability waiver in the terms of service is enough. What Anthropic really meant was: Using Claude for AI research is stealing their rice bowl. Safety is the packaging; the essence is non-compete defense. In short, it's all strategic knife-work.

What's even more cunning is that this defense mechanism was stealthy. Thankfully, Anthropic finally told the truth in their apology statement: "Invisible safety restrictions allow for more precise targeting of specific objectives, enabling us to deploy quickly with very low false-positive rates."

AI researchers are that precisely targeted group.

Now forced to switch to "visible," it's purely because they got caught. They even preemptively set expectations: making it visible will "inevitably lead to more false positives." Meaning, the experience of ordinary users will have to take the hit.

This rule set was never neutral; it only protects the paymasters.

The Trifecta: Hype, Monetize, Harvest

Anthropic's playbook is more meticulously calculated than their large models themselves.

On June 10th, they first released a safety research paper. They trained a model that could reverse-engineer exploit code for vulnerabilities in a matter of hours, based on security patches. What used to take hackers days or even weeks to weaponize an N-day vulnerability is now compressed to an hour scale. The research itself is solid, but releasing it on the same day as Fable 5's launch changes the flavor: proving AI is very unsafe on one hand, while selling the "safety net solution" on the other.

The "legendary model" Fable 5 is priced at $10 per million input tokens / $50 per million output tokens, a notch pricier than Opus 4.8, with the safety classifier becoming the core premium point. Capital markets played along perfectly. Anthropic's valuation hit $96.5 billion, with plans for an October IPO underwritten by Goldman Sachs and J.P. Morgan. What they're buying isn't model parameters; it's the persona of the "safest AI company."

Research amplifies anxiety, the product harvests the premium, capital cashes out. Three moves flowing with the interests, forming a seamless loop. The only problem was, this time the loop sprung a leak: In their haste to restrict competitors, they forgot the community has people who can test for it.

OpenAI Sells Tools, Anthropic Sells Anxiety

Compared to OpenAI, the approach is completely different.

OpenAI is secretly filing for an IPO, valuation nearing a trillion, pitching the "super app": ChatGPT with 900 million weekly active users, integrating with Visa to build an ecosystem. The logic is straightforward: provide tools, earn traffic. Greedy, but candid.

Anthropic doesn't compete on scale; it competes on irreplaceability. While the whole industry is anxious about safety, it plays the role of the "only responsible adult." Its patrons are governments and giants—these are the ones most afraid of incidents and most willing to throw money at "incident prevention."

Therefore, Anthropic must keep AI perpetually in a Schrödinger's cat state of "dangerous but controllable." Too safe, and the classifier doesn't sell; too dangerous, and clients run scared. The best solution? Keep the power to define "danger" firmly in their own hands.

The dumbing-down incident just exposed this logic taken too far: the boundary of "danger" was pushed to "using Claude for AI R&D." It doesn't matter if your research is harmful; threatening their lead is the original sin.

AI has no values; it's just the boss's business spreadsheet written in code.

Apology, Just After-Sales Service for the Business

What about after the apology? Changing from secretly dumbing down to giving a heads-up before dumbing down.

Netizens see right through it: "Do you really believe it won't secretly lower output quality in the future?"

Trust, once broken, stays broken. Especially when the underlying commercial motive hasn't changed: research still amplifies anxiety, the product still harvests the premium.

The Wall Street Journal reported that OpenAI is considering significant price cuts to snatch clients from Anthropic. Price wars aren't new, but this exposes a hidden truth: The ones being downgraded covertly are AI researchers, damaging reputation among the geek community. B2B clients buying Anthropic aren't buying parameters; they're buying the persona of "the industry's safety expert." Once that persona cracks within the core developer community, why should those government and enterprise clients, who sign contracts paying a "safety premium," continue to believe you're "the safest one"?

Out of that $96.5 billion valuation, how much is solid capability, and how much is performance?

Anthropic's code is honest. The safety classifier always protects the home turf; research is responsible for amplifying anxiety; the product is responsible for harvesting the premium; the IPO is responsible for cashing out. This apology is merely a patch to the system: changing "secretly dumbing down" to "overtly dumbing down."

If safety policies really worked, Anthropic wouldn't need to publish papers every year proving patches can be breached. If the classifier were truly neutral, doing AI R&D wouldn't be classified as high-risk.

The answer was already written in the business logic.

Safety is the best business. Apology is just the after-sales service.

This article is from the WeChat public account "AI Contrarian", author: Changqing

Perguntas relacionadas

QWhat was the main issue with Anthropic's 'intelligent safety classifier' in the Claude Fable 5 model, according to the article?

AThe main issue was that the safety classifier would silently and automatically downgrade user requests to a weaker model (Opus 4.8) if it detected the user was conducting cutting-edge AI development or research. The article argues this was not truly about user safety but was a form of 'competitive defense' to protect Anthropic's own business advantage.

QHow does the article contrast the business strategies of Anthropic and OpenAI?

AThe article contrasts them by stating OpenAI's strategy is to 'sell tools'—focusing on building a super-app ecosystem (like ChatGPT) and monetizing scale and traffic. Anthropic's strategy is described as 'selling anxiety'—leveraging and amplifying safety concerns to position itself as the indispensable, 'most responsible' AI company for government and enterprise clients, thereby justifying premium pricing.

QWhat three-step business 'playbook' does the article attribute to Anthropic?

AThe article describes Anthropic's playbook as a three-step cycle: 1) Research that amplifies AI safety anxieties (like a paper showing models can quickly weaponize security patches). 2) Product development that harvests a price premium based on claimed safety superiority. 3) Capitalizing on this through high valuation and IPO, creating a closed financial loop.

QWhat does the article suggest is the real consequence of Anthropic's 'silent downgrade' being exposed?

AThe article suggests the real consequence is the erosion of trust, especially within the core developer and AI research community. This damage to its reputation as 'the most safety-conscious company' among technical users could ultimately undermine the 'safety premium' justification for its enterprise and government clients, threatening its business model and high valuation.

QWhat is the article's ultimate conclusion about Anthropic's concept of 'safety'?

AThe article concludes that for Anthropic, 'safety' is primarily a business strategy rather than a neutral, ethical stance. It argues that Anthropic's safety measures, such as the classifier, are designed to serve its commercial interests (like protecting its competitive lead), and that the apology was merely 'after-sales service' for this business, not a change in its underlying commercial logic.

Leituras Relacionadas

Sequoia Dialogue with Jensen Huang: Computing Model Undergoes a 60-Year Transformation; You Won't Be Replaced by AI, But You Will Be Dimensionality-Reduced by 'Those Who Master AI'

NVIDIA founder and CEO Jensen Huang, in a conversation with Sequoia Capital's Konstantine Buhler, argues that we are witnessing the most significant computing shift in 60 years—from retrieval-based to generative computing. Instead of just storing and retrieving data, future systems will generate highly personalized content (text, images, video) on demand, powered by massive "AI factories." Huang envisions a global "intelligence network" that will envelop the planet, following the historical patterns of energy and communication grids. He outlines a five-layer investment framework: 1) Energy, 2) Chips/Computers, 3) Infrastructure (data centers), 4) AI Models, and 5) Applications. He predicts this ecosystem will reach a scale of $20 trillion annually. Crucially, Huang pushes back against fears of AI-driven job loss. He distinguishes between specific "tasks" (e.g., typing, analyzing images) and overall "jobs" (e.g., CEO, radiologist). While AI automates tasks, it increases efficiency and demand for the higher-value problem-solving aspects of professions, thus creating more jobs and "up-leveling" careers. The real risk, he asserts, is not being replaced by AI, but being outperformed by someone who effectively leverages it. He urges everyone to embrace AI as a tool for augmented capability and innovation.

marsbitHá 2m

Sequoia Dialogue with Jensen Huang: Computing Model Undergoes a 60-Year Transformation; You Won't Be Replaced by AI, But You Will Be Dimensionality-Reduced by 'Those Who Master AI'

marsbitHá 2m

"I Don't Need a Better Model Anymore": A Panorama of AI Users Under a Reddit Hot Post

Titled "I Don't Need a Better Model Anymore": AI User Reactions on Reddit Anthropic recently released Claude Fable 5, its first publicly available 'Mythos'-tier model, achieving 80.3% on the SWE-Bench Pro benchmark and significantly outperforming its predecessor and competitors. However, a viral Reddit post titled "Claude Fable made me realize I don't need better models anymore" highlighted a growing user sentiment of "good enough." Top comments expressed "model fatigue," with users stating that earlier models like Opus 4.5/4.8 already sufficed for their workflows. High cost was a key concern, as Fable 5's API is nearly twice the price of Opus 4.8, with users questioning the return on investment and suggesting the field has hit a plateau. The most frequent complaint targeted Fable 5's stringent safety filters. Designed to intercept high-risk requests (e.g., cybersecurity), the system was perceived as overly conservative. Users reported frequent rejections for routine security-related tasks, leading to automatic fallbacks to the older Opus model. Paying users were particularly frustrated, feeling they paid a premium for a less usable product. Dissenting voices came from users with heavy, complex tasks. For workloads like high-energy physics simulations with thousands of code lines, Fable 5's improved long-context understanding and error detection represented a significant, worthwhile leap—described as moving from a "college player to an NBA starter." The debate underscores a divergence between benchmark performance and practical utility. For most users, current models meet their needs, making further advances relevant only for extreme use-cases. The discussion also raised concerns about a potential "Public AI Freeze," where the most powerful models (like the restricted Mythos 5) remain exclusive to enterprises and governments, while public offerings stagnate. The launch presents two report cards: one of technical excellence and another of user skepticism. Fable 5's ultimate reception may depend on Anthropic's ability to refine its safety filters and justify its cost for specialized, high-demand users.

marsbitHá 9m

"I Don't Need a Better Model Anymore": A Panorama of AI Users Under a Reddit Hot Post

marsbitHá 9m

When AI Traffic Surpasses Humans, How Do You Prove You're Human?

With AI-generated web traffic surpassing human activity, websites face a crisis as AI agents bypass ads, avoid clicks, and scrape data without generating revenue. This disrupts the ad-based internet economy, diverting traffic and reducing site visits. In response, sites are blocking AI crawlers and deploying traps like Cloudflare's "honeypot" pages. Traditional CAPTCHAs are now ineffective against advanced AI. The focus has shifted to behavioral biometrics—analyzing unique human patterns such as cursor movement, typing rhythm, and keystroke dynamics. Companies like IBM and BioCatch use this data to distinguish humans from bots, even detecting fraud through behavioral inconsistencies. Two competing approaches aim to verify human identity centrally. Sam Altman’s World (formerly Worldcoin) uses iris scanning to create unique credentials, though it faces privacy concerns and regulatory bans. Alternatively, cryptographic zero-knowledge proofs offer anonymous verification without revealing personal data, championed by Vitalik Buterin to avoid centralized surveillance. However, both systems have flaws. Centralized solutions risk biometric data misuse, while decentralized models may be exploited through identity rental markets in economically unequal regions. Despite challenges, the author favors cryptographic methods for preserving privacy over pervasive behavioral monitoring that permanently captures and controls personal biometric data.

marsbitHá 18m

When AI Traffic Surpasses Humans, How Do You Prove You're Human?

marsbitHá 18m

2026 Landscape of Decentralized AI: Why is Blockchain the Inevitable "Antidote" for AI?

**The 2026 Landscape of Decentralized AI: Why Blockchain is the "Cure" AI Cannot Ignore** Decentralized AI addresses fundamental bottlenecks of centralized AI: scarce and expensive computational resources, excessive control concentration, unverifiable model outputs, and increasing difficulty in acquiring training data due to privacy and regulation. Blockchain offers a path to make intelligence open, verifiable, and economically accessible. The technical stack comprises three layers: 1. **Applications & Services**: The main crypto use cases are "Agentic Finance" (converting natural language into on-chain actions) and "Agentic Payments" for machine-to-machine commerce. Projects like Giza, Infinity Labs, Coinvest AI, and x402 (handling 173M+ transactions) are key players. 2. **Middleware**: This coordination layer enables agents to discover, identify, and transact. Notable projects include Gokite AI (specialized L1), Virtuals (an OS for the agent economy), and especially Bittensor—a network of specialized subnets forming competitive AI micro-economies. 3. **Infrastructure**: The capital-intensive layer providing raw resources. It includes decentralized compute (Akash, Render, Aethir), verifiable inference (Venice AI, OpenGradient), distributed training (Prime Intellect, Templar AI), decentralized storage (Filecoin, Walrus), and privacy/verification layers (Nillion, Arcium, Phala Network) using technologies like ZKPs, MPC, and TEEs. The outlook for 2026-2027 indicates AI demand outpacing infrastructure, with AI agents as a primary growth engine. Computation is becoming an asset class, with on-chain markets as its financial layer. Tokenomics is emerging as a structural advantage for coordinating capital, compute, and data in decentralized AI networks. While still early—with adoption uneven and revenue often trailing token incentives—projects like Bittensor, NEAR, and Virtuals demonstrate a shift from speculative narrative to a new model for coordinating intelligence.

marsbitHá 20m

2026 Landscape of Decentralized AI: Why is Blockchain the Inevitable "Antidote" for AI?

marsbitHá 20m

a16z Crypto Partner: Cash Flow is the Moat

Cash Flow as the Moat: A Playbook for Crypto Founders Historically, the most enduring businesses have been built by positioning themselves within the "flow of funds"—facilitating the creation and transfer of value in a network and extracting a portion of it. Cryptocurrency is the first modern technology natively built for this purpose. For startups, failing to architect products and businesses to leverage these principles means missing a major opportunity. Blockchains are inherently network businesses. Each transaction settles on a shared ledger, and every new participant strengthens the underlying network for all. Well-designed network tokens amplify this by aligning users, developers, and validators around growing the network, with value flowing back to contributors in a transparent feedback loop. This model is not new; companies from railroads and Standard Oil to Google, Meta, and AWS have thrived by inserting themselves into critical flows of value (goods, attention, compute). Financial markets make it even clearer: firms like Visa and major market makers generate immense revenue not by predicting markets but by being in the path of transactions. The combination of fund flow and network effects creates one of the most durable business structures. The high margins in traditional finance (payments, custody, lending, FX) represent prime targets. Crypto founders have the opportunity to build the next version—programmable, instant, global, and natively in the flow of funds. The frontier extends beyond finance to areas like computing/GPUs, AI training data, energy, robotics, and space—markets without entrenched intermediaries, ripe for building new, efficient value rails on programmable infrastructure. Founders should ask: Are you in the flow of funds today? Does your revenue scale 10x with the value of activity on your platform? Where in your target market are profit margins highest relative to value created? The opportunity is clear: embed your startup into the new flows of value and let the network effects accumulate.

marsbitHá 23m

a16z Crypto Partner: Cash Flow is the Moat