Anthropic's 'Cry' Sparks Wall Street Panic! 27-Year-Old, Mythos Instantly Defeated by 8 AIs

marsbit发布于2026-04-12更新于2026-04-12

文章摘要

Anthropic's announcement of Claude Mythos, an AI tool claiming to discover thousands of zero-day vulnerabilities—including a 27-year-old bug in OpenBSD—triggered alarm on Wall Street and prompted emergency meetings among financial regulators fearing systemic cyberattacks. However, independent tests revealed significant exaggerations in these claims. Researchers found that many reported vulnerabilities were in obsolete software or were impractical to exploit, and the findings relied on only 198 manual reviews. Furthermore, multiple smaller, open-source AI models (some with as few as 3 billion parameters) successfully identified the same critical flaws at a fraction of the cost, demonstrating that AI cybersecurity capability does not linearly scale with model size. Meanwhile, users reported severe performance degradation in Claude Opus 4.6, with reduced reasoning depth and increased API costs. Critics, including prominent hacker George Hotz, accused Anthropic of overstating risks for marketing purposes, creating a "wolf cry" scenario where hype overshadows reality.

Claude Mythos hasn't even truly appeared yet, but it has already sparked panic across Wall Street.

Overnight, US financial regulators summoned major banks for an emergency meeting, the atmosphere tense and confrontational—

They unanimously believed that Mythos could trigger an unprecedented, AI-driven storm of systemic cyber attacks.

But the fact is, everyone was deceived!

Among the tens of thousands of vulnerabilities discovered by Mythos, the vast majority exist in "outdated software" that simply cannot be exploited.

Worse still, those reports of "critical" 0day vulnerabilities relied on merely 198 manual reviews.

Researchers from the AISLE experiment also retested Mythos's "achievements," and found:

AI security capabilities do not scale linearly with model size; they are truly distributed in a "jagged" pattern.

They used a GPT-OSS-20b model with only 3.6 billion active parameters to accurately identify the flagship FreeBSD vulnerability discovered by Mythos.

And a model with 5.1 billion active parameters also successfully replicated the analysis logic for a vulnerability that had lain dormant in OpenBSD for a whopping 27 years.

Not only were Mythos's discovered vulnerabilities exaggerated, but on the other side, Claude Opus 4.6 was exposed as severely "dumbed down," causing an uproar.

Some even found Opus 4.6 to be inferior to both ChatGPT and Opus 4.5.

Mythos Hype Explodes

36B Model Unearths 27-Year-Old Vulnerability

A few days ago, Anthropic proudly released Claude Mythos (Preview) and "Project Glasswing."

In a 244-page system card, they claimed—

Mythos had autonomously unearthed tens of thousands of 0day vulnerabilities, including old bugs hidden for 27 years in OpenBSD and 16 years in FFmpeg.

The father of C++ even stated bluntly: Mythos is very powerful and should rightly be feared.

However, a latest hardcore test report from AISLE founder Stanislav Fort directly tore off this gorgeous facade.

The test conclusion is extremely颠覆性 (subversive):

8 open-source models all discovered the signature FreeBSD zero-day vulnerability, the smallest having only 3 billion parameters.

The moat of AI cybersecurity capability absolutely lies outside any single "top large model."

To verify the Mythos myth, the team extracted several flagship vulnerabilities showcased by Anthropic官方.

Then, directly threw them to a bunch of small, inexpensive, even open-source models.

FreeBSD NFS Vulnerability Universally Insta-Killed

Including GPT-OSS-20b (only 3.6B active params) and DeepSeek R1, all 8 models successfully detected this complex stack buffer overflow vulnerability.

Most shockingly, the cost per million tokens for these successful open-source small models was as low as $0.11.

OpenBSD SACK Vulnerability "Full Chain" Reproduction

For the 27-year-old vulnerability requiring极强的 mathematical reasoning, GPT-OSS-120b (5.1B active params) successfully reconstructed the complete public exploit chain in a single API call and provided a top-grade (A+) exploit sketch.

Furthermore, in tests identifying false vulnerabilities (OWASP false-positive), an even more bizarre phenomenon emerged—

Faced with a highly deceptive piece of Java code disguised as an SQL injection, small models like DeepSeek R1 easily saw through the disguise and accurately tracked the data flow.

In contrast, top closed-source models like GPT-5.4 and Claude Sonnet 4.5 all capsized in the ditch, misjudging it as a high-risk vulnerability.

This means that in the field of cybersecurity, there is no such thing as a single "forever strongest" model.

198 Manual Reviews Inflating, Mostly Unexploitable

Another report from Tom'sHardware dug into the truth behind the data—

Sample Bias: Among the so-called "thousands" of vulnerabilities, many existed in old software that was no longer maintained;

Unexploitable: A large number of marked "weaknesses" could not be triggered or exploited in practical environments;

Manual Inflation: The model's proclaimed powerful destructive force was actually based on just 198 manual reviews.

Therefore, extrapolating a "world-changing threat" from an极小规模的样本 (extremely small sample) is a data extrapolation method that clearly doesn't hold water in academia and the security community.

Security Bigwig Furious

Not only that, top cybersecurity expert, legendary hacker George Hotz couldn't sit still either,直言 these risks were severely exaggerated.

This大佬, famous for cracking the iPhone and PlayStation 3, publicly challenged the two AI giants on social media.

His wording was extremely sharp—

What if I released one 0day vulnerability every day until the new model is released?

Would that make OpenAI and Anthropic shut up and stop peddling so-called "cybersecurity risks"?

Hotz's core point is very direct: software vulnerabilities are actually much easier to find than AI labs portray.

The current scarcity of zero-day vulnerabilities isn't due to technical difficulty, but legality issues. He believes nobody is seriously looking because hacking into others' systems is illegal.

Only Slightly Better Than GPT-5.4

In the system card, Anthropic stated that the Claude model itself is indeed improving, and Mythos preview shows significant progress compared to Opus 4.6.

The Epoch Capability Index (ECI) is a single metric combining multiple AI benchmarks, enabling model comparison across long time spans.

On multiple benchmark tests, Claude Mythos indeed comprehensively surpassed Opus 4.6.

Otherwise, why release a new AI model that is less performant and more expensive?

But compared to GPT and Gemini, Claude Mythos's progress isn't some breakthrough; Mythos is still a relative linear improvement over previous models!

Climate and clean energy investor, author Ramez Naam, was even more direct:

On the Epoch Capability Index (ECI), Mythos shows no acceleration trend, only slightly better than GPT 5.4.

https://epoch.ai/eci/

But just by aligning Anthropic's internal ECI report with the official public ECI report from Epoch AI, it becomes apparent that Mythos似乎并没有加速ECI的迹象 (seems to show no signs of accelerating ECI).

It's all Anthropic's套路 (tactic)!

In the system card, Anthropic also admitted: the reported ECI scores for models like Mythos have greater uncertainty.

Furthermore, Anthropic's progress on Mythos stemmed from human research, without significant help from AI models. Significant Recursive Self-Improvement (RSI) has not yet appeared.

AI Doomsday, Self-Directed and Self-Acted?

Previously, Anthropic also encouraged media (e.g., "60 Minutes") to report on "extortion research," exaggerating and manipulating public sentiment, which was called a "scam" by investment大佬 David Sacks.

Sacks observed a clear pattern: every time Anthropic releases a new model, it simultaneously releases a chilling security study to grab headlines and guide public opinion.

Regarding this, he sarcastically said, "Anthropic has proven good at two things: one is releasing products, the other is scaring people."

He doesn't doubt Anthropic can make excellent products, but this tactic of frightening the public is questionable.

This time, whether Anthropic is engaging in "hunger marketing" is unknown, but it is undoubtedly protecting its own profit bottom line.

Mythos isn't without progress, but Anthropic packaged "limited progress" as a "world-class threat"; more ironically, while loudly渲染 (hyping) super-AI risks, users are complaining that Opus 4.6 has明显变笨 (obviously become dumber).

Claude Severely Dumbed Down, "Lobes" Possibly Cut

Claude Mythos's atmosphere-rendering was successful, but the dumbing down of Opus 4.6 has caused much dissatisfaction.

These days, complaints are flying everywhere.

Netizens直言, Anthropic has彻底 turned Opus 4.6 into a vegetable.

Faced with the same car wash puzzle, Opus 4.5 actually defeated Opus 4.6.

Even more, a log from an AMD manager truly confirmed the collective suspicion of "Claude lobotomy."

Through in-depth analysis of Claude session logs from January-March, the results revealed:

Claude's "median thinking length" plummeted from about 2200 characters to around 600 characters, meaning deep reasoning capabilities were severely compressed.

Between February and March, API requests surged 80-fold. Because Claude's thinking process shortened and single-attempt success rates dropped, users had to retry frequently, resulting in both higher token consumption and skyrocketing costs.

Another资深 (veteran) Claude Max subscriber wrote a long article deeply criticizing Anthropic.

In his view, Anthropic is deeply trapped in a compute power dilemma, evident from its tightening usage limits and forcing users to reduce token consumption.

However, what angered him more than the technical bottleneck was its "unfocused" product strategy.

While the core model is unstable and bug-ridden, they are wasting precious compute power on developing flashy features like the "/buddy" terminal pet.

This is probably the most absurd "misplaced spacetime" in AI history: the Claude Mythos in the lab is destroying the world, while the Opus 4.6 on the web page is experiencing a linear智商 drop (IQ drop).

Anthropic has successfully created a "Schrödinger's Super AI."

References:

https://officechai.com/ai/anthropic-and-openai-are-exaggerating-cybersecurity-risk-says-hacker-george-hotz/

https://x.com/stanislavfort/status/2041922370206654879?s=20

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

https://x.com/cgtwts/status/2043095382121681272?s=20

https://www.reddit.com/r/ClaudeAI/comments/1siqwmp/anthropic_stop_shipping_seriously/

This article is from the WeChat public account "新智元" (New Wisdom Element), author: 新智元

你可能也喜欢

算力即抵押品：解析 USD.AI 的链上信贷模式

凭借"算力即抵押品"的创新模式，USD.AI 直击 AI 基础设施融资的核心痛点：为运营商提供高效的链上信贷，同时为 DeFi 资本打开参与真实 AI 增长收益的窗口。

HTX News13小时前

HTX News13小时前

OpenGradient（OPG）：构建可验证AI推理的链上基础设施

OpenGradient（OPG）定位于去中心化 AI 推理基础设施网络，核心目标是将 AI 推理结果转化为可验证、可审计的链上数据，使智能合约能够直接、安全地调用 AI 能力，从而解决 AI 结果在链上“可信使用”的关键问题。

HTX News13小时前

HTX News13小时前

被卡住的Polymarket：走过流量红利的真正大考来了

Polymarket作为预测市场龙头近期面临交易体验明显下降的问题，包括价格延迟、订单无法提交和交易确认缓慢等。其DeFi工程副总裁Josh Stevens承认，增长已超出基础设施承载能力，并宣布将进行“链迁移”（chain migration），同时重建核心订单簿系统（CLOB）、降低数据延迟、修复交易问题、提升网站性能，并计划推出永续合约（Perps）。 Polymarket早期选择Polygon链是因成本低且轻量，但随着用户交易行为变得高频，Polygon逐渐成为增长瓶颈。此次换链不仅是底层公链的变更，更是整套交易系统的升级，旨在适应更接近交易所的运营需求。多个公链（如Solana、Sui等）已向Polymarket抛出橄榄枝，强调其高性能和低费用优势。而Polygon作为当前主要链，面临重要生态应用流失的风险，正积极合作解决痛点。 Polymarket的真正考验在于：从验证需求阶段转向规模运营后，必须证明其系统能稳定承接高频交易，确保用户留存和持续交易信心。

Odaily星球日报前天 03:19

Odaily星球日报前天 03:19

关键议员「松口」，沃什5月15日接任美联储主席「最大障碍」已清

阻碍凯文·沃什出任美联储主席的关键政治障碍已消除，北卡罗来纳州共和党参议员蒂利斯宣布撤回对沃什提名的阻挠立场，为4月29日的委员会提名投票扫清道路。此前司法部撤销对现任主席鲍威尔的刑事调查，蒂利斯对美联储独立性受威胁的顾虑得以缓解。沃什若获确认，预计于5月15日鲍威尔任期届满时接任，并可能推动废除“点阵图”等前瞻指引机制，重构全球资产定价逻辑。尽管沃什提名进程加速，鲍威尔去留仍存变数，特朗普未对其全面放行。沃什的政策立场可能移除市场利率预期工具，引发股债汇市场系统性重估。

marsbit前天 02:55

marsbit前天 02:55

调低 BTC 下一轮牛市的预期

作者Alex Xu分享了他对比特币下一轮牛市预期的调整，并解释了减仓BTC的原因。他曾在7万美元时卸掉杠杆，在10-12万美元时将仓位从满仓降至三成，近期又在78000-79000美元进一步减仓。主要原因包括： 1. 驱动BTC大涨的潜在能量减弱，如难以进入主权国家央行储备； 2. 个人机会成本上升，发现更具吸引力的投资标的； 3. 加密行业整体萎缩，影响BTC需求和共识； 4. 最大买家MicroStrategy融资成本持续攀升，可能抑制买入能力； 5. 代币化黄金等竞品在功能性上拉近与BTC的差距； 6. 比特币减半后安全预算问题日益严峻。尽管减仓，他仍持有部分BTC并希望其上涨，但会根据环境变化调整策略。本文仅为个人观点，供参考。

marsbit前天 02:40